160 likes | 271 Views
Using XML, XSLT, and CSS in a Digital Library. Markup Transformations SGML to XML Conversions Metadata Schema & Generation Robert Ferrer r-ferrer@uiuc.edu ASIS Annual Meeting 2000. SGML to XML Conversions - Modular. SGML to XML Conversions - Basic. Empty tags <empty> to < ….. />
E N D
Using XML, XSLT, and CSS in a Digital Library Markup Transformations SGML to XML Conversions Metadata Schema & Generation Robert Ferrer r-ferrer@uiuc.edu ASIS Annual Meeting 2000
SGML to XML Conversions - Modular ASIS Annual Meeting 2000
SGML to XML Conversions - Basic • Empty tags <empty> to < ….. /> • <?Processing Instruction> to <? ……... ?> • CDATA to CDATA sections <![CDATA[ … ]]> • Named entities remain unchanged - α • <!DOCTYPE ...> refers to XML DTD containing only character entity definitions to Unicode points <!ENTITY alpha “α”> ASIS Annual Meeting 2000
SGML to XML Conversions - Linking • Attributes to facilitate internal linking • <CITEREF REFID="bib5" idli_occurrence=”3” /> • External links represented as XLinks • <FIG NAME=“F1” xlink:type=“simple” xlink:href=“fig1.jpg” xlink:show=“new” xlink:actuate=“user” /> ASIS Annual Meeting 2000
SGML to XML Conversions - Math • SGML Math converted to MathML Presentational MathML <math xmlns=“http://www.w3.org/…”> <msubsup> <mrow><mi>α</mi></mrow> <mrow><mi>i</mi></mrow> <mrow><mo>-</mo><mn>2</mn></mrow> </msubsup> </math> ISO 12083 Math <dformula> <g>a</g> <sup>-2</sup> <inf>i</inf> </dformula> Identify & translate mathematical character references Identify & tokenize mathematical content ASIS Annual Meeting 2000
SGML to XML Conversions - Math • Recognize & transform mathematical markup • <xsl:template match=“dformula”> :<xsl:when test="sup or inf"> <xsl:for-each select="child::node()"> <xsl:choose> <xsl:when test="name(self::node())='sup' and name(following sibling::node()[1])='inf'"> <xsl:element name="msubsup” namespace=“http://www.w3.org/…”> <xsl:element name="mrow” namespace=“http://www.w3.org/…”> <xsl:apply-templates select="preceding-sibling::node()[1]"/> </xsl:element> ASIS Annual Meeting 2000
SGML to XML Conversions - TeX • TeX converted to GIF images • <FORM NOTATION="TEX" HIDE="TRUE">$$ (j_0-a_2')\,{\rm mod}\,P $$</FORM><uie name= “uie1” xlink:type="simple" xlink:href="fig1.gif" xlink:show="new" xlink:actuate="user” /> • TeX converted into MathML • IBM TechExplorer $$ (j_0-a_2')\,{\rm mod}\,P $$ <math><mo>(</mo><msub> <mrow><mi>j</mi></mrow><mrow><mn>0</mn></mrow></msub><mi>−</mi> <msubsup><mrow><mi>a</mi> </mrow><mrow><mn>2</mn>….. ASIS Annual Meeting 2000
SGML to XML Conversions - DTD • XML DTD does not permit inclusions and exclusions • SGML:<!ELEMENT Article - - (front, body) +(%i.float;)> • XML:<!ELEMENT Article (front | body | %i.float;)*> • XML DTD does not permit the ‘&’ connector • XML DTD does not permit the use of mixed content models • <!ELEMENT Other ((author, journal) | (#PCDATA))> ASIS Annual Meeting 2000
Metadata - Usage • Metadata Within the DLI Testbed • Normalize key fields from different publisher DTDs to facilitate searching • Provide common and easily displayable intermediate search results • Add value in the form of links to cited or citing articles within the Testbed, external abstracts and indexes, etc. ASIS Annual Meeting 2000
Metadata - Schema • Resource Description Framework (RDF) provides standardized way to represent metadata using XML • Encapsulates metadata elements • Provides varying levels of granularity • RDF container objects describe the relations between repeated metadata elements ASIS Annual Meeting 2000
Metadata - Schema • Dublin Core (DC) model is used to encapsulate all searchable metadata • Provides the semantic framework for describing each object in the collectionContent Intellectual Property InstantiationTitle Creator DateSubject Publisher FormatDescription Contributor IdentifierType Rights LanguageSourceRelationCoverage ASIS Annual Meeting 2000
Metadata - Schema • Extensive custom IDLI tags are included • Offer a further level of granularity • <DC:Description><idli:Abstract></DC:Description> • Search clients familiar with IDLI schema can achieve much greater precision • Dublin Core Qualifiers (DCQ) substructure to replace many of the project-specific IDLI elements • <DC:Description><DCQ:Abstract></DC:Description> ASIS Annual Meeting 2000
Metadata - Schema • <rdf:seq> • <rdf:li> • <dc:Creator> • <idli:author_name>Giust, G. K.</idli:author_name> • <idli:organization_name>Department of Electrical Engineering, Arizona State University</idli:organization_name> • </dc:Creator> • </rdf:li> • <rdf:li> • <dc:Creator> • <idli:author_name>Sigmon, T.W.</idli:author_name> • <idli:organization_name>Department of Computer Science, Illinois State University </idli:organization_name> • </dc:Creator> • </rdf:li> • </rdf:seq> ASIS Annual Meeting 2000
Metadata - Extracting • Metadata is extracted from the ‘base’ XML files • Utilization of XML Header • DTD is used to resolve entities • XML-Stylesheet processing instruction • Visual Basic application serves as parser • Document Object Model (DOM) • XSLT Style Sheets ASIS Annual Meeting 2000
Metadata - Extracting • Utilization of XSLT Style Sheets • XSLT transformative features to generate base metadata file and forward citation fragment • XSLT scripting features to generate elementsnot directly expressed in the document • XSLT instantiation of ActiveX objects to test for links ASIS Annual Meeting 2000
Metadata - Extracting • Utilization of DOM • Insert pseudo elements (e.g. bibliographic data) • Search reference citations from the generated metadata object to insert forward references into other metadata files ASIS Annual Meeting 2000