230 likes | 376 Views
XML for Science Data Access. R. Suresh (NASA/MTECH) ( suresh@mayurtech.com ) Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov). CEOS Joint Sub-Group Meeting, Frascati, Italy. Introduction. Earth Science data is exploding in resolution complexity heterogeneity volume
E N D
XML for Science Data Access R. Suresh (NASA/MTECH) (suresh@mayurtech.com) Kenneth McDonald (NASA/GSFC) (Mcdonald@rattler.gsfc.nasa.gov) CEOS Joint Sub-Group Meeting, Frascati, Italy
Introduction • Earth Science data is exploding in • resolution • complexity • heterogeneity • volume • Access to data collection is not a mere website • Data Access needs to provide data services across the user community • XML related technologies can provide building blocks to improve data access
XML Technologies • XML is really a set of closely-related technologies, including • XML: generalized markup • XLink and URI: interobject reference and linking • XML-Schema: document model definition • XSL: transformation and presentation • RDF: metadata and and inference • XQuery: retrieval from XML documents • SOAP: remote procedure calling • Key commonalities: • draft standards from WWW consortium • text-based • extensible/portable
XML Technologies • Suitable for metadata and "light data" • Structured • Hierarchical • Limited graph-like relationships (e.g. ID's) • Portable across • languages • operating systems • Becoming ubiquitous • standard parser API's (DOM, SAX) • parsers available in all major languages, platforms
XML Issues • No semantics associated with markup • No random-access • No non-textual content • Document Type Definition • Not itself encoded in XML • No constraints on element content • Context-free • Syntax of element contents independent of element’s position in document tree • No cardinality constraints
XML for Scientific Data Access • Good because it supports more than one data collection across: • discipline or sub-discipline (Ocean, atmosphere, Land) • multiple data types (e.g. satellite swath, Grid, point, vector, raster) • access modality (e.g. browsing, search, visualization, simulation) • Requires the generation of use scenarios • input from scientific community • Develop ontologies • Identify requirements
How to Use XML for Scientific Data Access(cont.) • Develop data and metadata models to enable the scenarios • identify community-wide data semantics • formal, incremental process • ongoing review and documentation • target key semantics for scenarios • use extensible data modeling technologies (e.g. XML, RDF, HDF) to implement data models • Link scenarios to build network of data services • Other concerns • security • intellectual property • data preservation
Building Blocks • XML • Translators • Description Languages • Applications • Advantages • Foster Evolution • Preserves interoperability • Internationalized text (unicode) • Structured text
XML based data format for interoperability FITS CEOS netCDF XML CDF HDF BUFR GRIB SDTS
Extensible Data Format (XDF) • What is XDF? • XDF is developed at the NASA GSFC • XML-based language for encapsulating scientific data. XDF aims to be the (mathematical) kernel of other fully-featured, discipline-oriented scientific formats written in XML. • key features: • Hierarchical data structures • Any dimensional arrays merged with coordinate information • High dimensional tables merged with field information, variable resolution • Easy wrapping of existing data • User specified coordinate systems • Searchable ASCII metadata • Extensibility to new features.
XDF Features • Structures, arrays, parameters, axes • Clear coordinate information • Unrestrictive binary and ASCII formats. • Examples: EOS, astronomy, biology, etc. • OO Perl and Java application interfaces • FITSML - adopt FITS keywords and an XML kernel • Converters between FITS, FITSML, HDF, and CDF. • XDF home page: http://tarantella.gsfc.nasa.gov/xml/XDF_home.html
A simplified structure with an image <XDF> <structure> <array> <axis name="X-axis"> <values> a list of values along one dimension</values> </axis> <axis name="Y-axis"> <values> a list of values along other dimension</values> </axis> <read> info on the ordering of the data values and record format. <recordFormat>...</recordFormat> </read> <data> The Data goes here </data> </array> <array> Some other array of data... </array> </structure> </XDF>
Advantages of XML based translators • Universal acceptance • Separation of information and presentation • Automatic validation • File inclusion (Internal and External Entities) • Hierarchical • Parsers • Stylesheet languages • Field specific languages • Extensible namespace
Earth Science Markup Language (ESML) • ESML is currently developed at the University of Alabama, Huntsville under a NASA grant. • Specialized Markup language for Earth Science Metadata based on XML • Machine readable and interpretable • Representation of the structure and content of any data file, regardless of data format • Human readable • External metadata files that can be generated by either data producer or consumer (at collection, data set or granule level) • Supports data/service interoperability
ESML • Users can describe and publish files using ESML • Users can describe ASCII and Binary data • ESML will facilitate data discovery • Metadata can be indexed and searched by web search engines • Allows users to utilize internet search engines to locate data • Web site: http//esml.itsc.uah.edu
ODL – XML Translator • A stand alone Java program • Extracts ODL metadata from HDF file • Displays metadata using style sheet • This program will be useful to build a metadata catalog • system in XML
HDF EOS Metadata Each HDF file contains three metadata elements: Inventory, archive and structural HDF- EOS Grid HDF- EOS Point HDF EOS has three file types or objects. Each file type will contain all three metadata elements HDF-EOS Swath XML
Metadata Tools & Systems– XML • Global Change Master Directory (GCMD) - NASA • The Earth System Markup Language (ESML, University of Alabama-Huntsville); • The DIstributed MEtadata System (DIMES, George Mason University); • The aggregation data catalog that is part of the Distributed Oceanographic Data System (DODS, University of Rhode Island); • GDLIP, General Digital Library Interchange Protocol (Alexandria Digital Library); • Digital Library for Earth System Education (DLESE); and • Web Mapping Testbed (OGC, Digital Earth).
Tools and Systems • VISAD infrastructure from SSEC http://www.ssec.wisc.edu/~billh/visad.html; • Live access server – PMEL http://www.ferret.noaa.gov/nopp/main.pl? • WXWise applets University of Wisconsin-Madison http://itg1.meteor.wisc.edu/wxwise/ • The Virtual Exploratorium http://www.unidata.ucar.edu/workshops/ShapingFuture/Presentations/Mohan_files/frame.htm • EDMI (Earth Data Multimedia Instrument, Bruce Caron, New Media Studio); and • WorldWatcher from Northwestern University University of Northern Colorado http://www.worldwatcher.nwu.edu/
Unified Access to Metadata User/system User/system User/system XML layer (database, access tool) Conceptual/physical layer Various Schemas (describing various “types” of metadata) Meta/Data System Meta/Data System Meta/Data System Meta/Data System
New Technologies: The Semantic Web • Multiple metadata objects (RDF documents) linked together • Ontologies • Taxonomies • Inference rules • Promise: agents can synthesize information from multiple documents • Like a world-wide ORDBMS T. Berners-Lee et al, Scientific American, May 2001
Semantic Web • "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web • In the Semantic Web we will need: • Machines talking to machines – semantics need to be unambiguously declared • Joined-up data – enabling complex tasks based on information from various sources • Wide scope – from, say, home to government to commerce • Trust – both in data and who is saying it • This is not going to be easily achieved
Conclusion • XML usage has increased in scientific data applications • Usage is not common across the systems • Web Services and Data Services • Semantic web for scientific applications is in infancy