170 likes | 186 Views
XML for Scientific Applications. Marlon Pierce ERDC Tutorial August 16 2001. What is XML?. Standard rule set for defining custom tags. Make your (meta)data human-readable. Separate data content from presentation (XSL). Rules for a particular dialect defined in either DTD or Schema.
E N D
XML for Scientific Applications Marlon Pierce ERDC Tutorial August 16 2001
What is XML? • Standard rule set for defining custom tags. • Make your (meta)data human-readable. • Separate data content from presentation (XSL). • Rules for a particular dialect defined in either DTD or Schema. • W3C: Standards Making Body • Same people that produced HTML. • See http://www.w3c.org
Ex: XML for Electricity and Magnetism <?xml version="1.0"?> <!DOCTYPE ProjectDesc SYSTEM "GridViewer.dtd"> <ProjectDesc> <GridData> <NumberOfMaterials>2</NumberOfMaterials> <GridDimensions> Tags omitted for brevity </GridDimensions> <DataFile> <FileName>balloon.dat</FileName> <FileType>ASCII</FileType> <FileFormat>P3D</FileFormat> <Compression>none</Compression> </DataFile> </GridData> …Tags omitted for brevity… </ProjectDesc>
EX: E&M DTD Fragment <!ELEMENT ProjectDesc (GridData,MaterialList)> <!ELEMENT GridData (GridDimensions,DataFile)> <!ELEMENT GridDimensions (X,Y,Z)> Cut for brevity. <!ELEMENT DataFile (FileName,FileType,FileFormat,Compression)> <!ELEMENT FileName (#PCDATA)> <!ELEMENT FileType (#PCDATA)> <!ELEMENT FileFormat (#PCDATA)> <!ELEMENT Compression (#PCDATA)> <!ELEMENT MaterialList (Material+)> <!ELEMENT Material (Name,Color,Epsilon*,Mu*,Sigma*,Mag*)>
What the DTD Tells You • What tags can be included • Parent/child relationships • The number of allowed tags of a particular type • 1 only, 0 or 1, 0 or more, 1 or more. • Names of attributes • If the tag takes parsable character data
Ex: E&M Schema Fragment <schema> <element name="ProjectDesc" type="ProjectDescType"/> <complexType name="ProjectDescType"> <element name="GridData" type="GridDataType"/> <element name="MaterialList" type="MatListType"/> <complextType> <complexType name="GridDataType"> <element name="NumberOfMaterials type="int"/> <element name="GridDimensions" type="GridDimType"/> <element name="DataFile" type="DataFileType"/> </complexType> ….</schema>
Schema v. DTD(a partial list) • Schemas are in XML; DTDs are not. • Schemas have several simple types (integers, strings, floats, …); DTDs treat everything as character data. • Schema complex types support inheritance • Bee complex type can be extended by drone, queen, worker subtypes. • But DTDs have been around longer.
Now What? • Get a parser for your favorite language • Apache XML Project’s Xerces parser supports Java, C++, Perl • http://xml.apache.org • Write code using the parser: • Validates XML files. • Returns the DOM. • You can now navigate the XML document tree
Document Object Model • Defines general entities that make up the document. • Forms a tree • Objects include • Document • Node • Element • Attribute ProjectDesc GridData MaterialList
Practical Drawbacks • The DOM classes are very general. They only provide you with the most general way of navigating the tree. • Typically for every XML dialect you create, you will have to write new code to extract the information. • It would be nice if there was a better way to do this….
Automatic JavaBeans with Castor • XML trees map nicely into Java Bean components. Get/Set methods return the information. • Castor: automatically generates JavaBeans from XML and vice versa. • You just write the Bean classes (simple) and Castor handles the mapping to XML. • http://castor.exolabs.org
Some Standard XML Dialects • Don’t reinvent what already exists. See http://www.w3c.org/TR • MathML • ChemistryML • SVG: Scalable Vector Graphics • SOAP: Simple Object Access Protocol • RDF: Resource Description Framework
XML Namespaces • Namespaces allow you to mix different types of XML. • You can combine custom and standard tags • Ex: combine GEMML plus MathML
Namespace Example <gem xmlns:gem="http://www.gem.org/gem" xmlns:m="http://www.w3c.org/TR/REC-MathML/"> <gem:analysis> <m:math> <!-- MathML expressions --> </m:math> <!-- GEM analysis content --> </gem:analysis> </gem>
Additional References and Resources • Inside XML by Steven Holzner. New Riders (2001). • The W3C has a nice schema tutorial at www.w3.org/TR/xmlschema-0/ • The ARL ICE project mixes XML and HDF5: www.arl.hpc.mil/ice/XdmfUser.html • XSIL is a markup language for scientific data: www.cacr.caltech.edu/SDA/xsil