Collection of material for FITS Technical Group

Assembled by Lucio Chiappetti (INAF IASF Milano, IAUFWG chair, TG member)
All material should be available on public sources

The Thomas et al. "critical" papers
ADASS poster, source
Longer paper (draft?), source
[Th1]ADASS poster (local copy, circulated by Bill)
[Th2]Longer paper draft (local copy, circulated by Bill)
Longer paper latest draft
(also linked on astropy)
Criticisms raised on astropy
[Py1]"the trouble with FITS" (contain further link to document)
[Py2]statement on binary tables
[Py3]"a replacement for FITS" (contains further link to document)
(requirement in bullets #1-2 and #4-5 are satisfied by current FITS ! and are not discussed any further)
[Py4]again on binary tables (see also entire related thread)
Issues raised on FITSBITS
[MT]by Mark Taylor (Topcat author) on binary tables
[EB]by Erik Bray on NaNs
Links about evolution of Planetary Data System (courtesy of Bob Hanisch)
A.Raugh's wiki
E.Shaya's page
Last but not least ...
[BP5P]BIll Pence's original Five Proposals
[LC] a very rough and preliminary draft of mine for a comprehensive way out (local copy
[LC2]some further elaborations on metadata, Unicode et al new!

List of perceived shortcomings

An attempt should be made to extract this list in an homogeneous format from the documents listed above (and to assign priority in which items should be handled).
Topics are listed in arbitrary order, using the occurrence order in the Thomas et al. ADASS poster and in the other sources listed above and in the order given above, and grouping similar topics.

For a detailed explanation of each item look at the original wording in the references given above, marked by a mnemonic [Xx], and a punctual reference to the section (s.n.m), paragraph (p.n) or line (l.n) or item (#n).

Priorities and comments are marked by the initials of the proposer (within braces). Comments covering several items are indicated by a simple catch phrase and linked at the end.

Priorities (from top priority 1 to lowest 999) are colour coded as
1 red for items to be taken seriously as soon as possible
10 orange for items to be taken seriously or partially seriously but of lower importance
50 yellow for items intermediate between the orange and green classes
90 green for items to be ignored since they are beyond the scope of FITS
500 gray for items which are neutral and could be deferred sine die
999 black for items rejected

PS: my original inclination would be to attribut "green" priorities ("beyond the scope of FITS") to many items, but recognizing some merits to the arguments in favour of them I moved them to "yellow".

The last column tries to group items by a coarse classification by topic:
HDR header and header keyword related
DAT (binary) data representation related
CON specific conventions
SEM semantics

Item Short id Ref. Description Priority Opinions Class

0 Slowness [Th1]s.1 p.2 [Py1] FITS standard evolved slowly {LC}900 {LC} true but did VO converge faster? (despite the amount of effort and money thrown in it) ???

0.0 Obsolescence [Py1]#1 Current s/w out of date (... e.g X11) (sic!) {LC}999 {LC} nobody forbids anybody to write "newer" FITS s/w ???

1 Data models [Th1]s.2.2 p.1 [Th2]s.2.1. s.2.2 No standardized data model association {LC}90 {LC} VO? discipline? data model! SEM

1A Errors & data quality [Th1]s.2.2 p.2 [Th2]s.2.2.1 [Th2]s.2.2.4 No standard models for errors and data quality {LC}80 {LC} discipline? Moreover proposals like those in [Th2] are semantics and do not concern the format (both errors and data quality). Assignment of a standard pre-retrieval data quality is an ambitious task, for the archivers (VO?). Argument is serious but difficult. SEM

1B Provenance [Th1]s.2.2 p.2 [Th2]s.2.2.3 no machine-readable general HISTORY {LC}70 {LC} VO? discipline? Personally I am not convinced that documentation kwds should not be primarily human-readable, and that a rigid machine-readable syntax (blocking data analysis on irrelevant data) would be a bonus (more a nuisance!) Requires either another convention or new standard? propose one! SEM

1B1 Data provenance [Th2]s.2.2.3 p.3 no traceability of which data files contributed {LC}20 {LC} a PARENT kwd? or family thereof? store command line? (if any!) All approaches seen in mission-specific contexts HDR

1C WCS [Th1]s.2.2 p.3 [Th2]s.2.2.2 WCS complex, incomplete, inflexible {LC}19 {LC} WCS is open: please supply details and proposals
link with 4A, limitations of short kwd names on complex WCS conventions HDR

1D Units [Th2]s.2.2.5 Standardization of (new) data units insufficient {LC}100 VO task? but they think even IVOA is insufficient! However VOunits work looks pretty sensible, and FITS should align more than diverge! CON

2 Network [Th1]s.2.4 end [Th2]s.2.4.1 Streaming indeterminate size unsupported {LC}30 {LC} is it really a problem? it wasn't for tapes! it won't be for URLs with a Content-Length! Otherwise just use staging files! DAT

3 Large distributed datasets [Th1]2.2.4 beg [Th2]s.2.4.2 TB datasets across multiple file systems. Grouping convention insufficient {LC}81 {LC} Requests in [Th2] are overloading FITS with something it is outside of it! See data organizers or use of external databases (see 3B) Would delegation of part of HDU/s to external URI wise? DAT

3B FITS and database tables LC motu proprio Devise a convention to map FITS tables from/to database tables {LC}25 CON

3C FITS tables with >999 columns [MT] Devise a way to handle broad tables (joins from database tables) {LC}24 Allowing TFIELDS>999 will require long kwd names (see 4A) and may require a WCS-II with long KWD names DAT

3D Mapping FITS tables to VOTables [Py4]#2-3 points raised in [Py4] are astropy specific {LC}35 However they make reference to a topcat convention to add XML VOTable info onto FITS files, to be examined (but it is not registered, see this astropy comment discipline? )
requirement for a standard "column description kwd" in this astropy message CON

3E variety of internal representations [Py3]#3 must support a variety of different byte-level data types ... of all byte sizes {LC}100 {LC} FITS has already some, and more or less enough (see 5B though,. and an established mechanism to handle them in images and bintables. As a convinced Ockhamist, I think data types shall not be multiplied (and used) beyond necessity (e.g. unsigned are not really necessary); compare Java "primitive data types" vs objects DAT

3F (efficient) random access [Py3]#6 should support reading and writing to specific subsets of the data without requiring the entire file to be read into memory {LC}110 {LC} Essentially is requiring random access, which we have. Contrasts with item 2,compare also item 4I. DAT

3G preview thumbnails [Py3]#8 should support thumbnail-style lower resolution data (... associated with the main data) for quick view purposes {LC}150 {LC}Not a priority, viewers are usually fast enough, anyhow could be handled by a (foreign ?) extension CON

4A1 8-char kwd name [Py1]#3 [Th1]s.2.3 l.2 [Th2]s.2.3.1 p.3-4 [BP5P]#1 8-char kwd name too short {LC}1 work in progress HDR

4A2 No namespaces [Th1]s.2.3 l.6 Lack of namespaces {LC}31 {LC} invent a convention! CON

4A3 68-char kwd limit [Py1]#3 [Th1]s.2.3 l.2-3 [Th2]s.2.3.1 p.5 [BP5P]#2 [Py4]#1 Kwd values too short, HIERARCH, CONTINUE insufficient {LC}2 work in progress HDR

4B 2880-byte blocks [Th1]s.2.3 l.8 [Th2]s.2.3.1 p.6 2880-byte blocks are an excessive overhead for tiny datasets {LC}200 {LC} either leave with it and do all-FITS (XMM CCF approach) or use tiny self-documenting ASCII file for tiny datasets
and don't tell me that XML is more efficient! DAT

4C Real time writing [Th1]s.2.3 l.9 2880-byte blocks limit real time writing {LC}31 See also 2 DAT

4D Unextendable header [BP5P]#5 [LC] Header located at the front unextendable without extensive rewriting {LC}6 linked with other "kwd" items HDR

4E No array kwds [Th2]s.2.3.1 p.2 [LC] Better convention for lists, sets, arrays of kwds {LC}5 {LC} true! HDR

4F Data association [Th2]s.2.3.2 awkward data association among HDUs in a MEF {LC}40 {LC}This is a task for a data organizer; devise convention for index files? CON

4G Data endianness [Th2]s.2.3.3 No support to little endian byte order {LC}999 {LC disguised as Ockham} Absolutely no! Look at Java! or import and work in native format DAT

4H Variable length rows [Py2] [Py1]#2 [Py3]#10 Request to disallow variable length rows in BINTABLEs {LC}999 {LC} Nobody is obliged to use them. I agree to use sparingly. A normalizing utility could be provided. DAT

4I Table storage inefficient [Py3]#10 Request to store tables by columns in consecutive bytes {LC}90 {LC} we could live with storage by row as we did so far DAT

4I Kwd typing [LC] Header kwd not strongly typed {LC}21 HDR

4J NaN in kwd values [EB] Allow IEEE NaN and Inf in kwd header values {LC}26 discussed in the past on FITSBITS, could be encoded as strings like 'NAN', also because of 4I HDR

4K Metadata support [Py3]#9 metadata should be either stored in or easily exported to a more commonly used format (i.e. XML?) {LC}80 {LC} don't see great advantages in XML (see also a nice reading). but mapping could be handled by external utilities. BTW what was of this ADASS 2001 proposal ? HDR

5A No versioning [Py4]#7 [Th1]s.2.1 p.1 [Th2] 2.1.1 [BP5P]#4 No way to tell which features or convention supported {LC}18 {LC} para 4 of [Th2] is either B.S. or an illusion, legacy s/w shall deal with legacy data and ignore newer data More serious arguments about convention registry in para 5 of [Th2] CON

5A1 Informal variants [Th2]s.2.1.4 Too many informal variants are in use {LC}110 {LC}it is a lost cause (discipline) What they describe for "non-compliant VOTables" is not by chance! Build a system that even a fool can use and only a fool will want to use it! Do we want full portability or are happy with plain interoperability? CON

5B No Unicode [Th1]s.2.1 p.2 [Th2]s.2.1.3 [BP5P]#3 7-bit ASCII excessively limited {LC}15 [BP5P]#3 just adds dot,dollar and ASCII lowercase
{LC} worth tackling in conjunction with 4A,4F HDR

Item Short id Ref. Description Priority Opinions Class

Item	Short id	Ref.	Description	Priority	Opinions	Class
0	Slowness	[Th1]s.1 p.2 [Py1]	FITS standard evolved slowly	{LC}900	{LC} true but did VO converge faster? (despite the amount of effort and money thrown in it)	???
0.0	Obsolescence	[Py1]#1	Current s/w out of date (... e.g X11) (sic!)	{LC}999	{LC} nobody forbids anybody to write "newer" FITS s/w	???
1	Data models	[Th1]s.2.2 p.1 [Th2]s.2.1. s.2.2	No standardized data model association	{LC}90	{LC} VO? discipline? data model!	SEM
1A	Errors & data quality	[Th1]s.2.2 p.2 [Th2]s.2.2.1 [Th2]s.2.2.4	No standard models for errors and data quality	{LC}80	{LC} discipline? Moreover proposals like those in [Th2] are semantics and do not concern the format (both errors and data quality). Assignment of a standard pre-retrieval data quality is an ambitious task, for the archivers (VO?). Argument is serious but difficult.	SEM
1B	Provenance	[Th1]s.2.2 p.2 [Th2]s.2.2.3	no machine-readable general HISTORY	{LC}70	{LC} VO? discipline? Personally I am not convinced that documentation kwds should not be primarily human-readable, and that a rigid machine-readable syntax (blocking data analysis on irrelevant data) would be a bonus (more a nuisance!) Requires either another convention or new standard? propose one!	SEM
1B1	Data provenance	[Th2]s.2.2.3 p.3	no traceability of which data files contributed	{LC}20	{LC} a PARENT kwd? or family thereof? store command line? (if any!) All approaches seen in mission-specific contexts	HDR
1C	WCS	[Th1]s.2.2 p.3 [Th2]s.2.2.2	WCS complex, incomplete, inflexible	{LC}19	{LC} WCS is open: please supply details and proposals link with 4A, limitations of short kwd names on complex WCS conventions	HDR
1D	Units	[Th2]s.2.2.5	Standardization of (new) data units insufficient	{LC}100	VO task? but they think even IVOA is insufficient! However VOunits work looks pretty sensible, and FITS should align more than diverge!	CON
2	Network	[Th1]s.2.4 end [Th2]s.2.4.1	Streaming indeterminate size unsupported	{LC}30	{LC} is it really a problem? it wasn't for tapes! it won't be for URLs with a Content-Length! Otherwise just use staging files!	DAT
3	Large distributed datasets	[Th1]2.2.4 beg [Th2]s.2.4.2	TB datasets across multiple file systems. Grouping convention insufficient	{LC}81	{LC} Requests in [Th2] are overloading FITS with something it is outside of it! See data organizers or use of external databases (see 3B) Would delegation of part of HDU/s to external URI wise?	DAT
3B	FITS and database tables	LC motu proprio	Devise a convention to map FITS tables from/to database tables	{LC}25		CON
3C	FITS tables with >999 columns	[MT]	Devise a way to handle broad tables (joins from database tables)	{LC}24	Allowing TFIELDS>999 will require long kwd names (see 4A) and may require a WCS-II with long KWD names	DAT
3D	Mapping FITS tables to VOTables	[Py4]#2-3	points raised in [Py4] are astropy specific	{LC}35	However they make reference to a topcat convention to add XML VOTable info onto FITS files, to be examined (but it is not registered, see this astropy comment discipline? ) requirement for a standard "column description kwd" in this astropy message	CON
3E	variety of internal representations	[Py3]#3	must support a variety of different byte-level data types ... of all byte sizes	{LC}100	{LC} FITS has already some, and more or less enough (see 5B though,. and an established mechanism to handle them in images and bintables. As a convinced Ockhamist, I think data types shall not be multiplied (and used) beyond necessity (e.g. unsigned are not really necessary); compare Java "primitive data types" vs objects	DAT
3F	(efficient) random access	[Py3]#6	should support reading and writing to specific subsets of the data without requiring the entire file to be read into memory	{LC}110	{LC} Essentially is requiring random access, which we have. Contrasts with item 2,compare also item 4I.	DAT
3G	preview thumbnails	[Py3]#8	should support thumbnail-style lower resolution data (... associated with the main data) for quick view purposes	{LC}150	{LC}Not a priority, viewers are usually fast enough, anyhow could be handled by a (foreign ?) extension	CON
4A1	8-char kwd name	[Py1]#3 [Th1]s.2.3 l.2 [Th2]s.2.3.1 p.3-4 [BP5P]#1	8-char kwd name too short	{LC}1	work in progress	HDR
4A2	No namespaces	[Th1]s.2.3 l.6	Lack of namespaces	{LC}31	{LC} invent a convention!	CON
4A3	68-char kwd limit	[Py1]#3 [Th1]s.2.3 l.2-3 [Th2]s.2.3.1 p.5 [BP5P]#2 [Py4]#1	Kwd values too short, HIERARCH, CONTINUE insufficient	{LC}2	work in progress	HDR
4B	2880-byte blocks	[Th1]s.2.3 l.8 [Th2]s.2.3.1 p.6	2880-byte blocks are an excessive overhead for tiny datasets	{LC}200	{LC} either leave with it and do all-FITS (XMM CCF approach) or use tiny self-documenting ASCII file for tiny datasets and don't tell me that XML is more efficient!	DAT
4C	Real time writing	[Th1]s.2.3 l.9	2880-byte blocks limit real time writing	{LC}31	See also 2	DAT
4D	Unextendable header	[BP5P]#5 [LC]	Header located at the front unextendable without extensive rewriting	{LC}6	linked with other "kwd" items	HDR
4E	No array kwds	[Th2]s.2.3.1 p.2 [LC]	Better convention for lists, sets, arrays of kwds	{LC}5	{LC} true!	HDR
4F	Data association	[Th2]s.2.3.2	awkward data association among HDUs in a MEF	{LC}40	{LC}This is a task for a data organizer; devise convention for index files?	CON
4G	Data endianness	[Th2]s.2.3.3	No support to little endian byte order	{LC}999	{LC disguised as Ockham} Absolutely no! Look at Java! or import and work in native format	DAT
4H	Variable length rows	[Py2] [Py1]#2 [Py3]#10	Request to disallow variable length rows in BINTABLEs	{LC}999	{LC} Nobody is obliged to use them. I agree to use sparingly. A normalizing utility could be provided.	DAT
4I	Table storage inefficient	[Py3]#10	Request to store tables by columns in consecutive bytes	{LC}90	{LC} we could live with storage by row as we did so far	DAT
4I	Kwd typing	[LC]	Header kwd not strongly typed	{LC}21		HDR
4J	NaN in kwd values	[EB]	Allow IEEE NaN and Inf in kwd header values	{LC}26	discussed in the past on FITSBITS, could be encoded as strings like 'NAN', also because of 4I	HDR
4K	Metadata support	[Py3]#9	metadata should be either stored in or easily exported to a more commonly used format (i.e. XML?)	{LC}80	{LC} don't see great advantages in XML (see also a nice reading). but mapping could be handled by external utilities. BTW what was of this ADASS 2001 proposal ?	HDR
5A	No versioning	[Py4]#7 [Th1]s.2.1 p.1 [Th2] 2.1.1 [BP5P]#4	No way to tell which features or convention supported	{LC}18	{LC} para 4 of [Th2] is either B.S. or an illusion, legacy s/w shall deal with legacy data and ignore newer data More serious arguments about convention registry in para 5 of [Th2]	CON
5A1	Informal variants	[Th2]s.2.1.4	Too many informal variants are in use	{LC}110	{LC}it is a lost cause (discipline) What they describe for "non-compliant VOTables" is not by chance! Build a system that even a fool can use and only a fool will want to use it! Do we want full portability or are happy with plain interoperability?	CON
5B	No Unicode	[Th1]s.2.1 p.2 [Th2]s.2.1.3 [BP5P]#3	7-bit ASCII excessively limited	{LC}15	[BP5P]#3 just adds dot,dollar and ASCII lowercase {LC} worth tackling in conjunction with 4A,4F	HDR
Item	Short id	Ref.	Description	Priority	Opinions	Class

More to be written for other data sources

Detailed (personal) comments

{LC} VO: I have the impression that several of the requests in the Thomas et al papers are overloading FITS with what does not (or does not any more) belong to FITS. I think of items not relevant to a data format (data layout) but more to semantics. I could have expected these items to be sorted out by the VO initiative (e.g. UCDs, VO units, "data models" ...) and I am frankly deluded by the time taken to deal with them.
{LC} discipline: Could we honestly expect that FITS deals with standardization of the whole astronomy? And to be realistic, can we really expect people/institutions be disciplined? The statement on non-compliant VOTables in [Th2] 2.1.4 is indicative of a real situation. If a standard is rigid, complex or just comes late, people will deviate. If people do not understand it, people will deviate. I have seen even large institutions delivering catalogues as tab-separated or comma-separated ASCII files instead of FITS files. This is definitely true for individuals or small data producers, but might also be the case of big data producers which outsource to non-astronomers. ADD APOLOGUES
{LC} data model: I am constantly puzzled by the emphasis on a (metaphysical) Data Model (capitalized boldface blinking), even if each one of us may have one's own data model (lower case), or project or mission or bandwidth data models. As a convinced Ockhamist I believe that data models shall not be multiplied beyond necessity. In the past for instance I thought that X-ray data could be cast just as images, spectra, generic histograms, light curves, photon lists (with response matrices mapped to a couple of an image and an histogram. But while an image is (somewhat) clear, already a spectrum is a different thing for different people (not just X-ray people vs optical people, I've seen many different incarnations of spectra as 1-d images, plain binary tables, and others. This consideration overlaps with the two previous on VO and discipline
{LC}
{LC}

sax.iasf-milano.inaf.it/~lucio/FITS/NewTG/ :: original creation 2016 set 30 15:48:09 CEST :: last edit 2016 Sep 30 15:48:09 CEST