660 likes | 792 Views
Combined XML, SGML Issues. William J. ‘Bill’ McCalpin MIT, LIT, CDIA, EDP AIIM 2002 - March 6, 2002. About MHE. MHE is the “print2image2Internet” consulting firm
E N D
Combined XML, SGML Issues William J. ‘Bill’ McCalpin MIT, LIT, CDIA, EDP AIIM 2002 - March 6, 2002 MHE - the print2image2Internet consultants
About MHE • MHE is the “print2image2Internet” consulting firm • MHE’s principals have nearly 40 years of experience in electronic print streams, in taking electronic print streams to imaging systems, and now in taking legacy information to the Internet • See http://www.mhe-consulting.com MHE - the print2image2Internet consultants
About the Speaker • William J. ‘Bill’ McCalpin is a principal at MHE • Mr. McCalpin was the first - and for years the only - person in the world to have the MIT, LIT, CDIA, and EDP designations • Mr. McCalpin serves on the AIIM Accreditation Committee and AIIM Conference Committee MHE - the print2image2Internet consultants
About the Speaker (cont.) • Mr. McCalpin is on the Xplor Board of Directors and is Treasurer • Mr. McCalpin recently completed a two-year stint as Xploration Editor-in-Chief • Mr. McCalpin is a frequent speaker at both AIIM and Xplor MHE - the print2image2Internet consultants
What Do You Say When They Ask You, “When Are You Going To Support XML?” MHE - the print2image2Internet consultants
But The Real Question Is, “Why Should I Support XML?” MHE - the print2image2Internet consultants
Agenda • What is XML? • What do we do in “e-Business”? • When do you want to use XML? • The Right Way and the Wrong Way to use XML • The Flow of Information • The XML Bubble • The answer to “when” and “why” MHE - the print2image2Internet consultants
What is XML? MHE - the print2image2Internet consultants
XML And SGML • XML is eXtensible Markup Language • XML is an instance of SGML, Standard Generalized Markup Language, an ISO standard (ISO 8879) • XML is “extensible” because people and enterprises with common interests get together to define the tags which describe their data MHE - the print2image2Internet consultants
XML and HTML • HTML is a tagged language, but the tags are 40 or 50 “grammatical” tags like <p> or <h1> • XML is a tagged language, and the tags are (usually) created and agreed to by “domains” or vertical industry segments. E.g. <account_number> or <city> MHE - the print2image2Internet consultants
The ‘Document’ • A document is “an organized collection of information in time” • A document contains information which can be understood by human or machine, and has validity at some period in time • The information in a document can be organized in many ways - as text, bitmaps, print streams, tagged languages, etc. MHE - the print2image2Internet consultants
The New Document • Per this definition, the document • does not depend on which organization of the information is used (so long as author and recipient agree) • does not depend on the medium (paper, film, optical, magnetic or even parchment are all fine) • does not have to have presentation information, because the recipient may be a machine MHE - the print2image2Internet consultants
Three Parts of an XML ‘Document’ Tagged Data (in XML) Tag Definitions (in DTD or Schema) Presentation (in XSL or CSS) MHE - the print2image2Internet consultants
The XML Document • Data - data values bounded by XML tags • Presentation: • CSS - Cascading Style Sheets, like for HTML • XSL - format information in XML • Tag Definitions: • DTD - Document Type Definitions - old SGML definition • Schema - definitions in XML MHE - the print2image2Internet consultants
Data In the XML Document • Data is the purpose of an XML document • Each piece of data is specifically identified by a tag • Data is organized because the tags match patterns in the DTD or Schema • An example of data in XML: MHE - the print2image2Internet consultants
Data Example in XML <AUTHOR> <NAME>William J. "Bill" McCalpin, EDPP, CDIA, MIT, LIT</NAME> <JOBTITLE>Principal</JOBTITLE> <AFFILIATION>MHE</AFFILIATION> <ADDRESS> <STREET>1400 Cheyenne Dr.</STREET> <CITY>Richardson</CITY> <STATE>Texas</STATE> <ZIPCODE>75080</ZIPCODE> <EMAIL>mccalpin@mhe-consulting.com</EMAIL> </ADDRESS> </AUTHOR> MHE - the print2image2Internet consultants
Presentation in XML • Tags in XML don’t have natural formatting (unlike HTML), so if presentation is needed, it must be explicitly defined • CSS can be used for HTML and XML • XSL can be parsed by an XML parser, and it can be used by XML and XSLT • XSL example: MHE - the print2image2Internet consultants
Presentation Example • <?xml version="1.0"?> • <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> • <xsl:template match="author"> • <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0”... <TR> • <TD COLSPAN="2"> • <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0”... • <FONT COLOR="#000000"><xsl:value-of select="name"/></FONT> • </TD> • ... • </xsl:template> • </xsl:stylesheet> MHE - the print2image2Internet consultants
Why Two Style Sheet Languages? MHE - the print2image2Internet consultants
DTD/Schema in XML • The DTD is the “old” (SGML) way of defining not only what tags are valid, but their relative order, number, mandatory/optional attributes, and so on • The Schema is a total rewrite - written in XML itself - which defines all of the above as well as possible legal values for a tag (e.g., integer, date, days of the week, etc.) MHE - the print2image2Internet consultants
Schema Example • <?xml version="1.0"?> • <Schema name="sample_schema" ...> • ... • <!-- ********** Element Types ************ --> • <!-- *** data *** --> • <ElementType name="author"> • <element type="name" minOccurs="1" maxOccurs="1"/> • </ElementType> • ... • </Schema> MHE - the print2image2Internet consultants
What do we do in “e-Business”? MHE - the print2image2Internet consultants
What is “e-Business”? • Of course, e-Business is really just doing business using 100% electronic methods such as the Internet • In e-Business, we do transactions or exchange information using electronic media rather than the usual paper media • e-Business can broken down into two parts: • B2C • B2B MHE - the print2image2Internet consultants
B2C • B2C is “Business to Consumer” • Your business generates the information, and a consumer receives it • The consumer is normally interested only in the data and its presentation • Thus, in this scenario, the consumer needs only an XML document and CSS/XSL - which is more or less the same as HTML! MHE - the print2image2Internet consultants
Important Fact #1 • When you are engaged in B2C, and the recipient is a consumer with a “thin” client, then HTML is usually sufficient • Supplying the data in XML is usually a waste of time, because the recipient gets no additional value from the XML over HTML • XHTML is just HTML which is XML compliant MHE - the print2image2Internet consultants
B2B • B2C is “Business to Business” • Your business generates the information, and another business receives it • Frequently, the recipient is not a person, but a software process in the business • Thus, in this scenario, the recipient often needs only the XML data and the reference to the DTD or Schema - no presentation may be needed! MHE - the print2image2Internet consultants
Important Fact #2 • When you are engaged in B2B, and the recipient is a software process, then XML is often the most appropriate format • Binary data formats may be smaller, but will require more work and more maintenance • Don’t send presentation information unless the recipient actually wants your presentation information! MHE - the print2image2Internet consultants
When do you want to use XML? MHE - the print2image2Internet consultants
When Do I Use XML? • As we have seen, XML is best suited for the preservation of the “author’s” content • And (X)HTML is best suited for presentation of information to an enduser • And this leads us to... MHE - the print2image2Internet consultants
Important Fact #3 • In today’s market: • XML is better utilized when communicating with a “thick” client - that is, most B2B in which a software process is the recipient • (X)HTML is better utilized when communicating with a “thin” client - that is, most B2C in which an Internet browser is the recipient • And when is this not true? MHE - the print2image2Internet consultants
Exceptions to Fact #3 • XML can be used in B2C when the browser is used with so much Java and other local applications that the overall process resembles a thick client • (X)HTML can be used in B2B if the recipient is just a human being rather than a software process, e.g., when information is transmitted only to be viewed MHE - the print2image2Internet consultants
The Right Way And The Wrong Way To Use XML MHE - the print2image2Internet consultants
CML Chemical Markup Language • One of the early “vertical” implementations of XML • The official site is http://www.xml-cml.org/ • A “better” site is http://www.ch.ic.ac.uk/chimeral/ • CML uses the trio of tagged data, Schema, and XSL MHE - the print2image2Internet consultants
A CML XML Document <molecule title="caffeine" id="mol_caffeine"> <formula>C8 H10 N4 O2</formula> <string title="CAS">58-08-2</string> ... </molecule> MHE - the print2image2Internet consultants
The CML Schema • <?xml version="1.0"?> • <Schema name="cml_dev_karne" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> • ... • <!-- ********** Element Types ************ --> • <!-- *** data *** --> • <ElementType name="molecule" content="eltOnly" model="open" order="many"> • <element type="formula" minOccurs="0" maxOccurs="*"/> • ... MHE - the print2image2Internet consultants
A CML Stylesheet • <xsl:template match="molecule"> • <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0" CELLPADDING="3" BORDERCOLOR="#CCCCFF" BGCOLOR="#EEEEFF"> • <TR> • <TD COLSPAN="2"> • <FONT COLOR="#0000AA">Formula • <FONT COLOR="#000000"><xsl:value-of select="formula"/></FONT></TD><TD> • ... MHE - the print2image2Internet consultants
The CML Document • Note that each data item is tagged • Note that each tag matches the standard Schema • Note that the data is used to create a complex image in the browser - but not the only possible image! MHE - the print2image2Internet consultants
A Print to XML/HTML Conversion • Print stream does not contain any metadata, only data and presentation information • Tags cannot be meaningful unless they are reverse-engineered • The result might be only the tagged data and the stylesheet • Too often, the XML looks like: MHE - the print2image2Internet consultants
Bad XML Example • /* text positioning information */ • .ps0{position:absolute;top:533px;left:29px;width:40px;} • .ps1{position:absolute;top:533px;left:317px;width:38px;} • .ps2{position:absolute;top:533px;left:454px;width:90px;} • ... • /* font properties information */ • .ft1{font-weight:bold;font-size:22px;} • .ft2{font-size:17px;} • .ft3{font-size:11px;} • <!-- text starts here --> • <SPAN CLASS="ps0"><NOBR>Account Number</NOBR></SPAN> • <SPAN CLASS="ps1"><NOBR>12345</NOBR></SPAN> • <SPAN CLASS="ps2"><NOBR>Name</NOBR></SPAN> • ... MHE - the print2image2Internet consultants
An Image to XML Example • Most information may not be tagged • <invoice> • <account_no>12345</account_no> • <name>Bill McCalpin</name> • <data>70 02 02 02 02 FE A7 47 47 48 03 F9 A7 42 27 4A 74….</data> • </invoice MHE - the print2image2Internet consultants
The Flow of Information MHE - the print2image2Internet consultants
The Flow of Information • E-Business is about the flow of information between parties as well as within the enterprise • Traditionally, as information moves through the business process, we lose as much information as we add • Look at how we used to treat information: MHE - the print2image2Internet consultants
As Information Flow Used to Be MHE - the print2image2Internet consultants
As Information Flow Used To Be Data Data Toner on paper Data awareness (metadata) Presentation information Scan Composer X’010101’(bits) Archive Zap! MHE - the print2image2Internet consultants
As Information Flow Is Today MHE - the print2image2Internet consultants
As Information Flow Is Today Data Data Web page, emails, etc. Data awareness (metadata) Presentation information Transform Composer Text and graphics PDF Zap! MHE - the print2image2Internet consultants
As Information Flow Should Be MHE - the print2image2Internet consultants
As Information Flow Should Be email Data Data Data awareness (metadata) Data awareness (metadata) WAP Complete XML documents Web page Presentation information archive paper User MHE - the print2image2Internet consultants
Or, As In The XML Bubble... Web page Process Add presenta- tion Data & metadata email Data & metadata Data & metadata Process Cell phones B2B applica- tions Archive MHE - the print2image2Internet consultants
Important Fact #4 • Use XML to delay the loss of important information • Don’t throw away information until you commit the document to a final format which can’t support it • In other words, keep the information in XML as long as possible MHE - the print2image2Internet consultants