380 likes | 590 Views
VO Standards and Protocols XML VOTable UCD ConeSearch Roy Williams California Institute of Technology NVO co-director. XML: Structured Information. <From>Antonio Stadivarius</From> <To>Domenico Scarlatti</To> <Date> <Day>13</Day> <Month>4</Month> <Year>1723</Year> </Date> <Body>
E N D
VO Standards and Protocols XMLVOTableUCDConeSearch Roy WilliamsCalifornia Institute of Technology NVO co-director
XML: Structured Information <From>Antonio Stadivarius</From> <To>Domenico Scarlatti</To> <Date> <Day>13</Day> <Month>4</Month> <Year>1723</Year> </Date> <Body> Io bisogno una appartamento acoglienti a Cremona … </Body> Separation of structure from presentation 4/13/23 April 13, 1723 17.iv.1723 The computer can read the document and answer queries like this: “Find all memos from April 1723”
XML • Documents and data • Human readable, editable, mailable • Schema constrains structure • -- can encode data models • Can be transformed (XSLT) • -- other xml • -- html/pdf/excel etc • Tools • Parsers in Java, C, C++, Perl, Python, ... • Browsers and editors • XML databases • Binding to make API • For serialization, mediation, brokers
XML for science XML is a comfortable vehicle for our metadata and data models But the real challenge is: To define NVO-specific data objects And how they are used We need consensus more than either software or hardware VOTable VOResource services -- WSDL
XML example(no schema) <?xml version="1.0"?> <BookCatalogue> <Book> <Title>The Cambridge Star Atlas</Title> <Author>Wil Tirion</Author> <ISBN>0-52156-098-5</ISBN> <Publisher>Cambridge UP</Publisher> </Book> <Book> <Title> Parallel Computing Works!</Title> <Author>Geoffrey C. Fox</Author> <Author>Roy D. Williams</Author> <Author>Paul C. Messina</Author> <ISBN>1-55860-253-4</ISBN> <Publisher>Morgan Kaufmann</Publisher> </Book> </BookCatalogue>
XML Parsing SAX: Event-Based Handlers functions for StartElement, Text, EndElement, etc. Found elementBookCatalogue Found elementBook Found ElementTitle Found TextThe Cambridge Star Atlas Found End ElementTitle ….
Parsing DOM: Document Object Model Returns a tree-like Document object with data attached BookCatalogue Book Book Title Title Author Cambridge Star Atlas ISBN Parallel Computing Works! Wil Tirion
XML Schema <?xml version="1.0"?> <schema xmlns="http://www.w3.org/2000/10/XMLSchema" xmlns:cat="uri://BookCatalogue"> <element name="BookCatalogue"> <complexType> <sequence> <element ref="cat:Book" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> </element> <element name="Book"> <complexType> <sequence> <element ref="cat:Title" minOccurs="1" maxOccurs="1"/> <element ref="cat:Author" minOccurs="1"/> <element ref="cat:Date" minOccurs=”0" maxOccurs="1"/> <element ref="cat:ISBN" minOccurs="1" maxOccurs="1"/> <element ref="cat:Publisher" minOccurs="1" maxOccurs="1"/> </sequence> </complexType> </element> <element name="Title" type="string"/> <element name="Author" type="string"/> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </schema> Book.xsd = Xml-Schema Definition
XSchema <?xml version="1.0"?> <schema xmlns="http://www.w3.org/2000/10/XMLSchema" xmlns:cat="uri://BookCatalogue"> <element name="BookCatalogue"> <complexType> <sequence> <element ref="cat:Book" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> </element> <element name="Book"> <complexType> <sequence> <element ref="cat:Title" minOccurs="1" maxOccurs="1"/> <element ref="cat:Author" minOccurs="1"/> <element ref="cat:Date" minOccurs=”0" maxOccurs="1"/> <element ref="cat:ISBN" minOccurs="1" maxOccurs="1"/> <element ref="cat:Publisher" minOccurs="1" maxOccurs="1"/> </sequence> </complexType> </element> <element name="Title" type="string"/> <element name="Author" type="string"/> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </schema> All XML schemas have “schema” as the root element Book.xsd = Xml-Schema Definition
XSchema <?xml version="1.0"?> <schema xmlns="http://www.w3.org/2000/10/XMLSchema" xmlns:cat="uri://BookCatalogue"> <element name="BookCatalogue"> <complexType> <sequence> <element ref="cat:Book" minOccurs="0" maxOccurs="unbounded"/> </sequence> <annotation>Catalog is a sequence of books</Annotation> </complexType> </element> <element name="Book"> <complexType> <sequence> <element ref="cat:Title" minOccurs="1" maxOccurs="1"/> <element ref="cat:Author" minOccurs="1"/> <element ref="cat:Date" minOccurs=”0" maxOccurs="1"/> <element ref="cat:ISBN" minOccurs="1" maxOccurs="1"/> <element ref="cat:Publisher" minOccurs="1" maxOccurs="1"/> </sequence> </complexType> </element> <element name="Title" type="string"/> <element name="Author" type="string"/> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </schema> Default Namespace declaration: all these come from this standard namespace
XSchema <?xml version="1.0"?> <schema xmlns="http://www.w3.org/2000/10/XMLSchema" xmlns:cat="uri://BookCatalogue"> <element name="BookCatalogue"> <complexType> <sequence> <element ref="cat:Book" minOccurs="0" maxOccurs="unbounded"/> </sequence> </complexType> </element> <element name="Book"> <complexType> <sequence> <element ref="cat:Title" minOccurs="1" maxOccurs="1"/> <element ref="cat:Author" minOccurs="1"/> <element ref="cat:Date" minOccurs=”0" maxOccurs="1"/> <element ref="cat:ISBN" minOccurs="1" maxOccurs="1"/> <element ref="cat:Publisher" minOccurs="1" maxOccurs="1"/> </sequence> </complexType> </element> <element name="Title" type="string"/> <element name="Author" type="string"/> <element name="Date" type="string"/> <element name="ISBN" type="string"/> <element name="Publisher" type="string"/> </schema> This namespace is defined here& abbreviated as "cat" This element comes from the namespace called “cat” Book element defined here Book.xsd = Xml-Schema Definition
Namespace Content Here: uri://BookCatalogue can be abbreviated as "cat" The “cat” namespace contains: BookCatalogue Book Title Author ISBN Date Publisher
XML example(with schema) Here is the namespace that we are using in this document <?xml version="1.0"?> <BookCatalogue xmlns= "uri://BookCatalogue" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "uri://BookCatalogue http://www.mydomain.com/schemas/bookcatalog.xsd"> > <Book> <Title>The Cambridge Star Atlas</Title> <Author>Wil Tirion</Author> <ISBN>0-52156-098-5</ISBN> <Publisher>Cambridge UP</Publisher> </Book> <Book> <Title> Parallel Computing Works!</Title> <Author>Geoffrey C. Fox</Author> <Author>Roy D. Williams</Author> <Author>Paul C. Messina</Author> <ISBN>1-55860-253-4</ISBN> <Publisher>Morgan Kaufmann</Publisher> </Book> </BookCatalogue> Document is instance of a w3c schema Here is the URL of its schema
VOTable • Full metadata representation • Hierarchy of RESOURCEs • containing PARAMs and TABLEs • UCD (unified content descriptor) • a has unit meter • a has UCDORBIT_SIZE_SMAJ (Semi-major axis of the orbit ) • Can reference remote and/or binary streams • Table can be • Pure XML • "Simple Binary" • FITS Binary Table
<DATA> <FITS> <STREAMhref="ftp://server.com/mydata.fits" expires="2002-02-22"actuate="onRequest"/> </FITS> </DATA> Sample VOTable <?xml version="1.0"?> <!DOCTYPE VOTABLE SYSTEM "http://us-vo.org/xml/VOTable.dtd"> <VOTABLE version="1.0"> <DEFINITIONS> <COOSYS ID="myJ2000" equinox="2000." epoch="2000." system="eq_FK5"/> </DEFINITIONS> <RESOURCE> <PARAM name="Observer" datatype="char" arraysize="*" value="William Herschel"> <DESCRIPTION>This parameter is designed to store the observer's name </DESCRIPTION> </PARAM> <TABLE name="Stars"> <DESCRIPTION>Some bright stars</DESCRIPTION> <FIELD name="Star-Name" ucd="ID_MAIN" datatype="char" arraysize="10"/> <FIELD name="RA" ucd="POS_EQ_RA" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Dec" ucd="POS_EQ_DEC" ref="myJ2000" unit="deg" datatype="float" precision="F3" width="7"/> <FIELD name="Counts" ucd="NUMBER" datatype="int" arraysize="2x3x*"/> <DATA> <TABLEDATA> <TR> <TD>Procyon</TD><TD>114.827</TD><TD>5.227</TD> <TD>4 5 3 4 3 2 1 2 3 3 5 6</TD> </TR> <TR> <TD>Vega</TD><TD>279.234</TD> <TD>38.782</TD><TD>8 7 8 6 8 6</TD> </TR> </TABLEDATA> </DATA> </TABLE> </RESOURCE> </VOTABLE>
etc Table Cell follows FITS binary table does NOT follow XML schema boolean bit unsignedByte short int long char unicodeChar float double floatComplex doubleComplex scalar Primitives arrays variable length arrays etc
VOTable is Flexy • eg Table of images • UCD="meta.code.mime; image.jpeg" datatype="unsignedByte" arraysize="*" • eg Table of URL links • UCD=“meta.ref.url"datatype="char" arraysize="*"
Table Data Model • Metadata • Class definition for Row • FIELD • data type • semantic type • Data • Each Row is a list of Cells • Each Cell is an array of Primitives • may be variable length
Table Data Layout • All metadata first • small, complex, XML • Class definition for table record • + params, description, etc etc • Then data • (may be) large, remote • XML | binary | FITS • Instantiations of table record • All records MUST have same format • binary data allows streaming, parallelism
Param Data Model • Param is “Table with one cell” • Like a FIELD value • But with a “value” attribute
Primitives • All have fixed binary length • Same as FITS primitives • Except Unicode
Multidimensional Array Cell • A table cell can have lots of Primitives • Example: WCS parameters are arrays • <FIELD name=“CRVAL” datatype=“double” arraysize=“2”/> • Example: up to 10 images, each 64x64 • <FIELD name="thumbs" datatype="unsignedByte" arraysize="64x64x10*"/>
Hierarchy • A VOTable contains RESOURCES • RESOURCE can contain: • TABLE • RESOURCE • etc etc • Usage example • Many observations in the file, • each is a RESOURCE • Each observation is • Parameters • Calibration table • Raw data table
Hierarchy • New feature: GROUP <TABLE name=“Nutation and Aberration”> <GROUP name=“Nutation”> <FIELD name=“Longitude”/> <FIELD name=“Obliquity”/> </GROUP> <GROUP name=“Aberration”> <GROUP name=“Equinox 1950.0”> <FIELD name=“C”/> <FIELD name=“D”/> </GROUP> <GROUP name=“Equinox 1955.0”> <FIELD name=“C”/> <FIELD name=“D”/> </GROUP> </GROUP> </TABLE>
Astronomical Data • Image • Standard file format: FITS • Standardized c.1980 • Keyword-value dictionary + binary block • Catalog • Derived from image • Connected set of bright pixels • “Table of stars” • Standard format: VOTable • Standardized 2002 • XML with remote binary • Spectrum
XSLT Example <VOTABLE version="1.0"> <DESCRIPTION>Output from the messier catalog at VirtualSky.org</DESCRIPTION> <RESOURCE type="results"> <PARAM ID="RA" datatype="E" value="200.0" /> <PARAM ID="DE" datatype="E" value="40.0" /> <PARAM ID="SR" datatype="E" value="30.0" /> <PARAM ID="PositionalError" datatype="E" value="0.1" /> <PARAM ID="Credit" datatype="A" arraysize="*" value="Charles Messier, Richard Gelderman" /> <TABLE> <DESCRIPTION>Output from messier Catalog Server</DESCRIPTION> <FIELD ID="I" name="Messier Number" datatype="char" arraysize="*" ucd="ID_MAIN"> <DESCRIPTION>Messier Number</DESCRIPTION> </FIELD> <FIELD ID="RA" name="Right Ascension" datatype="float" unit="degrees" ucd="POS_EQ_RA_MAIN"> <DESCRIPTION>Right Ascension J2000</DESCRIPTION> </FIELD> .... <DATA> <TABLEDATA> <TR> <TD>3</TD> <TD>205.5</TD> <TD>28.402</TD> <TD /> <TD>16.2'</TD> <TD>6.4004</TD> <TD>Globular Cluster</TD> <TD>Canes Venatici</TD> <TD>M3 is one of more heavily studied globular clusters due to its position in the galaxy, putting it far from interstellar absorbtion. More than 200 variable stars have been observed out of a total of near 50,000. Being one of the brightest clusters, M3 is</TD> </TR>
XSLT Result this table is the result of a conesearch
XSLT Program <h2>Data</h2> <table border="1"> <xsl:for-each select="FIELD"> <td><b><xsl:value-of select="@name" /> </b></td> </xsl:for-each> <xsl:for-each select="DATA"> <xsl:for-each select="TABLEDATA"> <xsl:for-each select="TR"> <tr> <xsl:for-each select="TD"> <td width="100"><xsl:value-of select="." /></td> </xsl:for-each> </tr> </xsl:for-each> </xsl:for-each> </xsl:for-each> </table>
Binding to make a Parser From the Schema an API and library is generated JAXB Breeze Castor This is JAVOT (Caltech) for(int i=0; i<table.getFieldCount(); i++){ Field field = (Field)table.getFieldAt(i); String u = field.getUcd(); if(u != null && u.equals("POS_EQ_RA_MAIN")) System.out.println("Field " + i + " is for RA"); }
Unified Content Descriptor • UCD is a “semantic type” • phot.mag;em.opt.B Integrated total blue magnitude • src.orbital.eccentricity Orbital eccentricity • stat.median Statistics Median Value • Base + Specifiers • eg error in default right ascension • stat.error; pos.eq.ra; meta.main • First word is "type" • "what kind of thing is this?" • How do we add a stat.error to another?
Unified Content Descriptor • UCD has services • Natural Language Description • Find best UCD • Search in NLD • Matching functions • if I want pos.eq.ra, is stat.error;pos.eq.ra correct? • What about Ontology???
Some UCD S stat Statistical parameters Q stat.Fourier Fourier coefficient Q stat.Fourier.amplitude Amplitude Fourier coefficient P stat.covariance Covariance between two parameters P stat.error Statistical error P stat.error.sys Systematic error Q stat.fit Fit Q stat.fit.chi2 Chi2 Q stat.fit.dof Degrees of freedom Q stat.fit.goodness Goodness or significance of fit Q stat.fit.omc Observed minus computed Q stat.fit.param Parameter of fit Q stat.fit.residual Residual fit Q stat.likelihood Likelihood S stat.max Maximum or upper limit S stat.mean Mean, average value S stat.median Median value S stat.min Minimum or lowest limit
Some UCD S phot Photometry Q phot.calib Photometric calibration Q phot.color Color index or magnitude difference Q phot.color.Cous Color index in Cousins system Q phot.color.Gen Color index in Geneva system Q phot.color.Gunn Color index in Gunn system Q phot.color.JHN Color index in Johnson 65+ system S meta Metadata P meta.bib Bibliographic reference P meta.bib.author Author name P meta.bib.bibcode Bibcode P meta.bib.ivo IVOA identifier ivo:// P meta.bib.fig Figure in a paper P meta.bib.journal Journal name P meta.bib.page Page number P meta.bib.volume Volume number P meta.code Code or flag P meta.code.class Classification code
ID RA DEC x y z Cone Search • First VO standard service • Input: RA, DEC, SR must be present • decimal degrees J2000 • Output: VOTable of sky-located data records • must have columns with UCDs:POS_EQ_RA_MAIN, POS_EQ_DEC_MAIN, ID_MAIN RA=300 DEC=25 SR=0.1 Response Request
Result of Cone Search RA Dec ID
Cone Search + Density Probe Federation of Multiple Services baseURL Spacing Search radius Density Probe interoperating NVO-compliant services! Cone Search