60 likes | 80 Views
Learn how to effectively use XML to manage and organize documents in PFS 200x, an XML-enabled database. Capture and convert SGML documents, handle links and format conversions.
E N D
"Microsoft is creating tons of software around XML”."If you cut open any Microsoft product today, it bleeds XML”. - Office 2000, SQL Server 2000, .NET XML - The Extensible Markup Language • (Some) Promises/Expectations and Drivers: • shift from data processing to information processing (“data in context” = meaning, which is also information plus meta-information processing) • interoperability (application integration; a neutral data interchange format)i) sharing data between applications (A2A), ii) exchanging heterogeneous, differently structured data and information (B2B, B2G, G2G) • drivers: e-commerce, MA, JV, “The next generation of Web applications”. • affected: databases; data warehousing; document, content, knowledge management; portals • XML: a real subset of SGML • Marking up documents (tags: < >; meta-information) with regard to documentarchitecture (in terms of “document elements”; their logical and layout structures). However, it is essentially about the logical document architecture. • Structures are hierarchical: data type is the “tree” (incl. nesting, repetitions). Differentiate this from the data type “table” (RDBMS). Hence, one has“XML databases” (PFS 200x/InterHost 2000) and “XML-enabled databases” • In XML (SGML) structures are decribed in terms of a Document Type Definition (DTD). • Meta-language for the development of other languages (Chemical Markup Language (CML), Vector Image Markup Language (VML), Biopolymer Markup Language (BioML), Electronic Business Markup Language (ebXML), etc. US PTO: "Red Book (DTD) will likely migrate to XML in the next few years” (SGML XML)
XML and Corallaries: The XML Family W3CNote Recommendation Interoperability, Portability CODE DATA/INFORMATION Presentation, Transformation Processing Access/Retrieval XSL XSLT SAXDOM XPointer XML-QL XPath programmatic access XML XLink concrete syntax XMLns abstract data model,UML-like XInclude XML Schema XML InfoSet RDF Linking,Combining DTD Semantics
From XML to PFS 200x XML document: <title-of-report>How to use XML effectively. A Primer</title of report><author>Jack Daniels</author><title-of-author>Dr.</title-of-author> Document with “hanging indent” paragraphs (widespread in corporations!): PFS 200x: an “XML database” which support XML in their internal architectures.Have 1:1 mappings between XML document elements (“tags”) and PFS 200x databasefields: “struct.xml”. Manage links (e.g. to graphics)Think about format conversions (e.g. dates, patent numbers). Example: US patents (full documents with graphics) SGML,XML DTD exists (WIPO ST.32), PFS 200x database class with sufficiently deep structure exists (LITPAT), have corresponding converter (“parser”, CONX)
Capture of a US Patent in SGML Format into PFS 2000 XML output from PFS 2000database USPAT using field codes as tags, some of them being transformed by a wordbook SGML PFS 2000 document with “hanging indents” wordbook for XML output
Capture of a Full US Patent Document with Text and Graphics in SGML Format into PFS 2000 Links box allows to scrollsequentially through a(ordered) list of linkedfiles (EMIPIC)! FIG. 1 Captured long file names were reduced to the 8:3short form.