210 likes | 229 Views
Understand the need for exchanging tabular data in the Virtual Observatory context, benefits of using VOTable with associated metadata, and how it improves data interpretation and interoperability.
E N D
VOTable:Tabular Data for Virtual Observatory François Ochsenbein Roy Williams Clive Davenhall, Daniel Durand, Pierre Fernique, Robert Hanisch, David Giaretta, Tom McGlynn, Alex Szalay, Andreas Wicenec
The Context Need of exchanging data in tabular form: • Coming from a wide variety of data servers and archives (VO context) • Must include the associated metadata in order to be interpretable by applications • Must deal with potentially millions of records • Existence of FITS
VOTable History • Astrores at CDS/ESO (June 1999) • XSIL at Caltech (June 2000) • October 2001: first discussions • December 2001: VOTable 0.1 • January 2002: Interoperability meeting Strasbourg • 15 April 2002: VOTable 1.0 http://cdsweb.u-strasbg.fr/doc/VOTable/ VOTable archives & discussion groups: http://archives.us-vo.org/VOTable/
Why XML ? • includes in a single document the data and their associated metadata (descriptive data) • is of common usage since ~ 3 years • can be interpreted parsers and tools readily available • can be visualized (XSL) • can be encapsulated in messages
A “classical” XML Document <?xml version="1.0"?> <!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/…...dtd"> <RESOURCE name="myResource"> <OBSERVER>William Herschel</OBSERVER> <SOURCE id="mySource"> <STAR-NAME>Procyon</STAR-NAME> <POSITION equinox="J2000" epoch="J2000"> <RA unit="deg">114.827</RA> <Dec unit="deg">+05.227</Dec> </POSITION> <COUNTS> <COUNT>4</COUNT> <COUNT>5</COUNT> <COUNT>3</COUNT> </COUNTS> </SOURCE> ….. </RESOURCE>
Problems of “classical” XML Documents Each data element is <tagged>, meaning: • Huge overheads in terms of volume, required resources, and processing time Not adapted to multi-million row tables • Need to introduce new elements (tags) for each new parameter, or to cross-match a potentially large set of name spaces
The VOTable way VOTables follow the classical tabular presentation where the columns are assumed to be homogeneous in terms of their associated metadata; a VOTable document contains: • The metadata part (data description), essentially as a set of <FIELD> and <PARAMETER>specifications • The data part (serialisation), which may be in XML, FITS or binary.
<?xml version="1.0"?> <!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd"> <VOTABLE version="1.0"> <DEFINITIONS> <COOSYS ID="myJ2000" equinox="2000." epoch="2000." system="eq_FK5"/> </DEFINITIONS> <RESOURCE> <PARAM name="Observer" datatype="char" arraysize="*" value="William Herschel"> <DESCRIPTION>This parameter is designed to store the observer's name </DESCRIPTION> </PARAM> <TABLE name="Stars"> <DESCRIPTION>Some bright stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char" arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int" arraysize="2x3x*"/> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD> 5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> <TR> <TD>Vega</TD><TD>279.234</TD> <TD>38.782</TD><TD>8 7 8 6 8 6</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </VOTABLE>
<RESOURCE> <PARAM …/> … <TABLE> <FIELD…/>… <DATA> <TABLEDATA> <TR> <TD>… </TR> … </TABLEDATA> <BINARY> <STREAM …> </BINARY> <FITS extnum="n"> <STREAM …> </FITS> </DATA> </TABLE> </RESOURCE>
The <FIELD> and<PARAMETER> Describe the metadata attached to columns <FIELD> or to the resource <PARAMETER> name column label unit standardized unit datatype computer type width character representation precision character representation Arraysize repetition factor ucdstandardized parameter category
The UCDs Unified Content Descriptor • Interpretation of the table contents • Decide whether values can be compared • Data mining S. Derrière's talk on Friday Categorisation of the parameters listed in the table
datatype Meaning FITS Bytes "boolean" Logical L 1 "bit" Bit X * "unsignedByte" Byte (0 to 255) B 1 "short" Short Integer I 2 "int" Integer J 4 "long" Long integer K 8 "char" ASCII Character A 1 "unicodeChar" Unicode Character 2 "float" Floating point E 4 "double" Double D 8 "floatComplex" Float Complex C 8 "doubleComplex" Double Complex M 16
FITS Compatibility • Compatible data types • FITS keywords are represented as <FIELD>, e.g. width precision arraysize • Array and variable-length arrays • <DATA> may link to existing FITS data sets VOTable was designed to be compatible with existing FITS data tables
Data Serialization FITS or BINARY data may be embedded in the document, or remote; compression/encoding may be applied.
Existing tools and Servers • Several databases are delivering VOTables: HEASARC IPAC NOAO NRAO VizieR SIMBAD(cone search >50 services) • VOTable parsers in Perl, Java, C (different types of parsers for different applications) • VOTable validators • XSLT basic XML/HTML translators
DTD or XML-Schema • The VOTable rules are existing as a DTD (Document-Type Definition) and in the XML-Schema language (heavily used in developping WebServices applications)
VOTable appendices Astrores had two features not implemented in VOTables: 1.The LINK conventions describing how to get the correlated data (explanations, images, spectra…) based on substitution of the column contents … <FIELD name="FileName" datatype="char"…/> … <LINK href="http://server/getFile?${FileName}" …/> … <TR> … <TD>photo/procyon.dat</TD>… </TR> <TR> … <TD>photo/vega.dat</TD>… </TR>
VOTable appendices (2) 2. The Query Mechanismusing conventions similar to the HTML <FORM> for retrieving the data from user-supplied constraints <PARAM name="Observer" datatype="char" arraysize="*" /> <TABLE name="Stars"> <DESCRIPTION>Some bright stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char" arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int" arraysize="2x3x*"/> <LINK type="query" action="http://server-node/getResult?" /> </TABLE> toward more generic WDSL-like solutions ?
Conclusions • Just version 1.0 … more to come • Comments ? Proposals ? Join the discussion group VOTable@us-vo.org