Manage Scientific Metadata Using XML

Manage Scientific Metadata Using XML Yang, R., M. Kafatos and X. Wang, “Managing Scientific Metadata Using XML,” IEEE Internet Computing,Volume: 6 ,Issue: 4 ,pp.52 - 59 July-Aug, 2002

Outline • Abstract • Introduction • Metadata • XML • DIMES • Conclusion

Abstract • With explosively increasing volumes of remote sensing, model and other Earth Science data available and the popularity of the Internet, scientists are now facing challenges to publish and to find interesting data sets effectively and efficiently.

Introduction • The Earth-observing systems (EOS) satellite Terra alone adds more than half a terabyte of data each day. • Metadata have been recognized as a keytechnology to ease the search and retrieval of Earth science data.

Metadata(後設資料) • 描述資料的資料(data about data) • 描述資料的結構化資料(structure data about data) • 用來定義、辨識電子資源，以及協助資源取用的描述方式（from 國際圖書館協會）

大陸兵馬俑自民國九十年三月廿二日起，在臺中國立自然科學博物館展示，至五月十日截止。自由時報A 記者主題-兵馬俑展覽活動主辦單位 -國立自然科學博物館地點1a -臺中市地點1b -國立自然科學博物館時間1a -90/03/22 時間1b -90/05/20 消息來源-自由時報撰稿人-A 記者 EXAMPLE

後設資料(Metadata) • 早期應用於圖書館中的檢索卡片 • 現今運用於資料交換及全文檢索等索書號 BOOK/T58.6/H859

Metadata • Metadata are in very diverse formats since different data providers and data users usually define their own metadata schema.

Example (From 中研院後設資料小組)

Example

Metadata • How to handle the metadata, therefore, becomes a challenge to the designers and developers of distributed information systems.

XML-BasedDistributed Metadata Server(DIMES) • In this paper, we discuss the Distributed MEtadata Server (DIMES) prototype system. • Designed to be flexible yet simple, DIMES uses XML to represent, store, retrieve and interoperate metadata in a distributed environment.

XML & Metadata • The Extensible Markup Language (XML) is ideal for describing ASCII-based data because both human users and computers can understand XML-encoded data. • Most Earth science metadata are in ASCII format, and can therefore easily be migrated to XML.

DIMES • Currently, most work on XML-based metadata focuses on defining XML structure (tags and relations) for specific scientific disciplines. • Our XML-based software solution, on the other hand, supports a wide variety of metadata.

DIMES • We have developed such software, based on the XML4J package, with document-type definitions (DTD).

DIMES • Metadata model • XML query engine • Web-based prototype interface

Metadata Model • A common weakness of many existing Earth science distributed information systems is the lack of metadata interoperability support. • A naive way to integrate metadata from heterogeneous source is to represent metadata from different sources in XML format.

Metadata Model • There are two kinds of elements: • Node: Element with an ID attribute. • Nonnode: Element without ID attribute. • A node is uniquely identified by the ID attribute’s value.

Metadata Model • A node, together with all its nonnode elements, forms a basic information block for describing objects (data or metadata), and is identified by the ID value. • We assume the metadata provided is an XML document, and that it is in XML nugget form —that is, a separate XML document describes each data object.

XML nugget Node: Element with an ID attribute Nonnode Nonnode ．．． XML nugget Metadata

USING DTD FOR • Object identification • Type information • Node relationships

WHY DTD • From an ease-of-use viewpoint, DTD is arguably the best of the six proposed schema languages. • XML DTD • XML Schema • XDR • SOX • Schematron • DSD • D. Lee and W.W. Chu, “Comparative Analysis of Six XML Schema Languages,”SIGMOD Record, vol. 29, no. 3, 2000.

Metadata Model：Object identification • Each XML nugget has a unique ID value, and an ID attribute goes in the root of the XML nugget.

XML nugget Type Node Nonnode Nonnode ．．． Metadata Model：Type information • Since many XML nuggets can describe similar objects, we introduce a new XML element — a type node, which is assigned an ID attribute — for each object type, and make all XML nuggets that describe similar objects subelements of the type node. ．．．

Metadata Model：Node relationships • There are two ways to code node relationships in XML documents: • Subtrees • Pointers

Node relationships：Subtrees • When a node is a descendant of another node in the XML tree, the two nodes are related.

Subtrees : Type–Instance relationship • The child–parent relationship between two nodes often reflects the type–instance relationship between concepts.

Node relationships：Pointers • When a node points to another node in the XML tree by an IDREFS attribute, the two nodes are related. • Using IDREFS attribute for: • node_type • type_instances • refer_to • inline_types

TYPE Node TYPE Node INSTANCE Node relationships • There can be multiple types for a single instance, however, so it is desirable for a node to have multiple parents.

Type information • Unfortunately, the basic XML model does not support multiple parents for a single element. • Hence, we introduce the attributes node_type to record a node’s additional parents, and type_instances to record the reverse relationship.

Type information ID=1 ID=4 type_instance=3 ID=2 ID=3 Node_type=4

IDRefs attribute: refer_to • For simplicity, we assume that the refer_to relationship is symmetric, that is, if node A refers to node B, B also refers back to A.

IDRefs attribute: inline_types • Intuitively, a node represents a piece of identifiable metadata. • In practice, many nodes share information.

IDRefs attribute: inline_types • Forexample, many data sets have the same temporalcoverage,thus we represent temporal-coverageas a node. • We can define the temporal-coveragenode type as an inline node of dataset nodes byusing the inline_types attribute.

Metadata Model • This model requires: • Well-formed XML. • Do not use ID as an attribute name for any elements.

DIMES Metadata Model Summary • Data providers could add new nodes, new node attributes, and new links to satisfy their metadata requirements. • Additionally, having a flexible system implies that we can preserve much of the original metadata structure.

XML query engine

XML query engine • Basic query • Nearest-neighbor search • Tree-expand query

Basic queries • The simplest query is finding a node by its ID. • To answer these queries, our XML-based search engine evaluates these conditions on each node, including inline nodes.

Nearest-neighbor search • For a given node, its nearest-neighbor node from a given group is the one with the shortest distance. • Shortest distance between two nodes: • minimum number of relations (type–instance, parent–child, or refer_to) needed to connect the nodes.

EXAMPLE <Query queryType=”IDonly”> <Source IDlist=” Phenomenon1”/> <Target node_types=”DataSet”/> <Constraints></Constraints> </Query> …

Tree-expand query • If we choose one node as a root and all its nearest neighbors as the first-level branches, and so on, we will get a tree presentation. • In practice, we use the tree-expand query to present the metadata such that users can navigate it easily and understand its results quickly.

Prototype Web Browsers • A Web-based Dimes client usually includes a Web interface, an XML translator, and an XML-to-HTML mapper suite.

XML translator • When a Web user submits a query, the client passes the query to a specific XML translator, which automatically translates the query into one or more predefined types of queries in XML format, and then sends them to the XML query engine.

XML-to-HTML mapper • An XML-to-HTML mapper converts the output from XML into an HTML page, and returns the result to the user. • We use Java servlets and XSL Transformations for the translator and mapper tools.

Prototype Web Browsers • We have developed two Web-based prototypes for exploring Dimes’ capabilities. • Regular search • http://spring.scs.gmu.edu:8499/servlet/VASearchInterface • Metadata navigation • http://spring.scs.gmu.edu:8499/servlet/SiesipDataTree

DIMES Conclusion • Our work is closely related to mediators in federated databases, with the goal of accommodating various metadata sources into a unified framework. • Our long-term goal is to integrate software components with existing data servers to build the Scientific Data and Information Super Servers (SDISS) which are defined here as servers to support interactive access to metadata, data, and domain knowledge.

THE END THANK YOU

Manage Scientific Metadata Using XML

Manage Scientific Metadata Using XML

Presentation Transcript

XML for Scientific Computing

Storing and Querying Scientific Workflow Provenance Metadata Using an RDBMS

Using XML

Using XML in

XML Metadata Services

XML and Metadata

Metadata, Structured Documents, and XML

OKC Tools for XML Metadata Management

Metadata Acquisition with XML

Metadata: Soup to Nuts Using Metadata

Scientific Applications of XML

Using XML Transactions

Using XML Tools

Getting your metadata using PROC METADATA

Metadata and XML

The XML/RDF encoding schemes for CERIF. Application profiles for scientific metadata.

XML for Scientific Applications

Using XML Transactions

XML Metadata Services