1 / 96

The Document In The 21st Century

The Document In The 21st Century. William J. “Bill” McCalpin MIT, LIT, CDIA, EDP Principal, MHE. Who MHE Is. MHE is the consulting firm which specializes in the transition of information both within and between the electronic printing, imaging, and Internet environments. Introduction.

ghalib
Download Presentation

The Document In The 21st Century

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Document In The 21st Century William J. “Bill” McCalpin MIT, LIT, CDIA, EDP Principal, MHE MHE - Consultants for Document and Datament Technologies

  2. Who MHE Is... MHE is the consulting firm which specializes in the transition of information both within and between the electronic printing, imaging, and Internet environments. MHE - Consultants for Document and Datament Technologies

  3. Introduction The Hegelian Dialectic MHE - Consultants for Document and Datament Technologies

  4. Thesis, Antithesis, Synthesis In the philosophy of Hegel, these words show the inevitable transition of thought, by contradiction and reconciliation, from an initial conviction to its opposite and then to a new, higher conception that involves but transcends both of them MHE - Consultants for Document and Datament Technologies

  5. The Hegelian Dialectic • Thesis: Most business have well-established, productive legacy systems • Antithesis: XML is springing forth everywhere and will replace most legacy systems • Synthesis: XML will be integrated with legacy systems - enhancing some processes, changing many others, and eliminating some altogether • In short, XML will change - not destroy - what you do MHE - Consultants for Document and Datament Technologies

  6. The Document In The 21th Century MHE - Consultants for Document and Datament Technologies

  7. What Is A Document? • The American Heritage Dictionary defines a document as “information in writing placed on a medium such as paper, often used as a record.” • Documents have been placed on clay tablets, gold leaf, animal skins, all types of paper, microfilm, optical storage, and so on MHE - Consultants for Document and Datament Technologies

  8. Information And Presentation • In every case, the document represents a fundamental union of information and presentation • But “presentation” presumes that the primary audience for the document is a human being • With the coming of the Internet, this is no longer the case MHE - Consultants for Document and Datament Technologies

  9. The Curse Of Presentation • Composition products require that you specify a printer, even before you know where the document will print MHE - Consultants for Document and Datament Technologies

  10. Why Are Print, Image, And Presentation Formats Incompatible? MHE - Consultants for Document and Datament Technologies

  11. Printing And Imaging Formats • Many printing formats: AFP, Metacode, DJDE, XES (UDK), PostScript, PCL, etc. • All formats use external resources like fonts, forms, graphics, etc., although sometimes inconsistently • Most are escape-sequence based, some are formal data architectures, and some are almost programming languages MHE - Consultants for Document and Datament Technologies

  12. Printing And Imaging Formats • Many imaging formats - while most use CCITT Group 4 for image compression, most also have proprietary data wrappers • Later systems adopted text-based formats such as PDF, although storing other print streams is not unknown • Systems which store text-based formats must wrestle with resource issues MHE - Consultants for Document and Datament Technologies

  13. Different Print Formats • Why do printers have different formats? Because of physical constraints imposed by the hardware: • resources reduce the amount of data sent through pipeline to printer • pages must be imaged in less than a fraction of a second • complex graphics can be developed on the printer, but this needs a special language MHE - Consultants for Document and Datament Technologies

  14. Different Imaging Formats • Why do imaging systems have different formats: because of physical constraints imposed by the hardware: • Mass storage was expensive • Indexing schemes were too close to the application • Text is avoided sometimes because of resource issues • Interoperability with other products an issue MHE - Consultants for Document and Datament Technologies

  15. Result • In each case, data architecture decisions were made in order to enhance some aspect of legibility of the stored objects. • If there were no requirement to present the information (to a human reader), then the requirement for custom data formats for each vendor would probably disappear! MHE - Consultants for Document and Datament Technologies

  16. Information Exchanges • B2C - business to consumer • B2B - business to business • B2B2C - business to business to consumer • *2C requires presentation information • B2B requires no presentation information, if the recipient is a process, not a person MHE - Consultants for Document and Datament Technologies

  17. Why B2B? • NYSE (New York Stock Exchange) • Formerly, 100 million trades in a day was considered very heavy • Now 1 billion trades a day is considered very heavy • The difference is automation; the same multiplier applies to B2B • #1 effect of XML is the separation of information from presentation MHE - Consultants for Document and Datament Technologies

  18. The Nature Of XML MHE - Consultants for Document and Datament Technologies

  19. XML And SGML • XML is eXtensible Markup Language • XML is an instance of SGML, Standard Generalized Markup Language, an ISO standard (ISO 8879) • XML is “extensible” because people and enterprises with common interests get together to define the tags which describe their data MHE - Consultants for Document and Datament Technologies

  20. XML And Print Formats • In most print formats, something like an account number would be: • AMB 200 AMI 300 SCFL 01 STO 0, 90 TRN 12345-67890 • In XML, the same information is: <account_number>12345-67890</account_number> MHE - Consultants for Document and Datament Technologies

  21. XML and Image Formats • Raster-based image formats contain only bitmaps • To read the text data within the bitmap requires an OCR/ICR process, which can fail • Most usable data is extracted from the document and placed in the index MHE - Consultants for Document and Datament Technologies

  22. XML And Electronic Formats • The nature of all electronic presentation formats is to be focused on the presentation of the information. • The nature of XML is focused on the “author’s content”, that is, information is described as what it is, not how it looks. MHE - Consultants for Document and Datament Technologies

  23. XML enables the total separation of information from presentation Thus, some XML objects have only tagged information, while others have content and presentation information Separating Information From Presentation XML XML XSL MHE - Consultants for Document and Datament Technologies

  24. How To Relate XML to Everyman • You might think that XML is too esoteric for most people to understand • But XML is based on the basic human need exchanging information • XML couples the communication skills we have used over the last several thousand years to modern, Internet technology • So how can you understand it? MHE - Consultants for Document and Datament Technologies

  25. Communication Difficulty #1 • In order for any communication to take place, both parties must share the same fundamental mechanism which carries information • For example, in writing, if a boy and girl don’t even share the same writing schemes, they can’t possibly understand... MHE - Consultants for Document and Datament Technologies

  26. Chinese Characters vs Latin Alphabet “I Love You” MHE - Consultants for Document and Datament Technologies

  27. Underlying Structure of XML • Text characters • Tags are delimited by “<“ and “>”, i.e. <xml> • Ending tags have “/”, e.g., </xml> • Parameters are indicated by double quotes, e.g., <PAPER track="Application"> • XML is a series of tags and data, e.g., <STATE>Texas</STATE> MHE - Consultants for Document and Datament Technologies

  28. Communication Difficulty #2 • Once both parties agree to the fundamental syntax, then both parties must next agree to the words to be used • In the case of XML, how do both parties know that <STATE> means a political subdivision and not one of {gas,liquid,solid}? MHE - Consultants for Document and Datament Technologies

  29. A Date Gone Bad • One evening in the hotel lobby bar, two young Italian men spend a while talking to an attractive Venezuelan girl...and her aunt • They spoke Italian and she spoke Spanish, but they communicated passably MHE - Consultants for Document and Datament Technologies

  30. A Date Still Going Bad • However, the aunt wanted to go up to her room with her niece • The Italians wanted to take the young lady out dancing... • So they asked her: MHE - Consultants for Document and Datament Technologies

  31. What the boys said: “Vuoi andare con noi ‘sta sera? What the young lady needed to hear: “Quisieras ir con nosotros esta tarde?” Oops MHE - Consultants for Document and Datament Technologies

  32. Miscommunication • Even though Italian and Spanish use the same sounds, the same grammar, and have a common ancestry in Latin, some words are different • Unfortunately, the most common words in both languages are likely to be the most different MHE - Consultants for Document and Datament Technologies

  33. The Cost Of Data Differences “NASA lost a $125 million Mars orbiter because one engineering team used metric units while another used English units for a key spacecraft operation...” CNN 9/30/99 MHE - Consultants for Document and Datament Technologies

  34. XML “Words” • HTML has a certain number of fixed tags - everyone knows what they are, but they can’t be augmented • In XML, everyone can make up their own tags to suit their needs - but how do we avoid a Tower of CyberBabel? MHE - Consultants for Document and Datament Technologies

  35. Communication Difficulty #3 • Even when you agree to common tags, you still need to agree to a common understanding • In XML, the Schema (now replacing the DTD) defines what tags are allowed to describe a particular collection of data • For example, in the field of human relations, what is a “date”? MHE - Consultants for Document and Datament Technologies

  36. A woman thinks: Invitation - formal Dress-up - nicely Eat out – dinner with wine at nice restaurant Entertainment – see a movie Private moment – good night kiss <!DOCTYPE Date [ <!ELEMENT Date (Invitation, Dress, Meal, Entertainment+, Intimacy) > <!ELEMENT Invitation (#PCDATA) > <!ELEMENT Dress (#PCDATA) > <!ELEMENT Meal (#PCDATA) > <!ELEMENT Entertainment (#PCDATA) > <!ELEMENT Intimacy (#PCDATA) > One DTD For A “Date” MHE - Consultants for Document and Datament Technologies

  37. A Woman’s View Of A “Date” <date> <invitation>Telephone call</invitation> <dress>Long dress</dress> <meal>4-star restaurant</meal> <entertainment>the theatre</entertainment> <intimacy>A passionate, romantic kiss</intimacy> </date> MHE - Consultants for Document and Datament Technologies

  38. A man thinks: Eat out – six-pack of beer Private moment – necking <!DOCTYPE Date [ <!ELEMENT Date (Meal,Intimacy+) > <!ELEMENT Meal (#PCDATA) > <!ELEMENT Intimacy (#PCDATA) > Another DTD For A “Date” MHE - Consultants for Document and Datament Technologies

  39. A Man’s View Of A “Date” <date> <meal>six-pack of beer</meal> <intimacy>necking </intimacy> </date> MHE - Consultants for Document and Datament Technologies

  40. <date> <invitation>Telephone call</invitation> <dress>Long dress</dress> <meal>4-star restaurant</meal> <entertainment>the theatre</entertainment> <intimacy>A passionate, romantic kiss</intimacy> </date> <date> <invitation>Honking </invitation> <dress>Not the shirt he changed the oil in</dress> <meal>food and beer</meal> <entertainment>rent a video</entertainment> <intimacy>A passionate, romantic kiss while necking</intimacy> </date> When Men And Women Agree MHE - Consultants for Document and Datament Technologies

  41. The Four Stages Of XML Evolution MHE - Consultants for Document and Datament Technologies

  42. The Evolution Of Technology • Creation of basic technology • Growth of technical tools • Conversion of technology into business applications - the penetration into verticals • Reduction to commodity MHE - Consultants for Document and Datament Technologies

  43. #1 Creation Of The Basic Technology Of XML MHE - Consultants for Document and Datament Technologies

  44. Creation Of Basic Technology • In 1998, the World Wide Web Consortium declared XML to be a “recommendation”, that is, a world-wide standard • This phase began in 1990 with the creation of the Web and browsers, and is now substantially complete MHE - Consultants for Document and Datament Technologies

  45. #2The Growth Of Technical Tools MHE - Consultants for Document and Datament Technologies

  46. Growth Of Technical Tools • Once the underlying technology has been created, tools and utilities are built to use this technology • These tools are often somewhat primitive and are not focused on the business problem • This phase has been going furiously since 1998 MHE - Consultants for Document and Datament Technologies

  47. The World Wide Web Consortium and XML MHE - Consultants for Document and Datament Technologies

  48. World Wide Web Consortium • The World Wide Web Consortium was created in October 1994 to develop common protocols that promote the Web’s evolution and ensure its interoperability • The W3C has more than 500 Member organizations from around the world • The W3C has many roles MHE - Consultants for Document and Datament Technologies

  49. The Roles of the W3C • Standards Body (XML and others) • Software and Services • Working Groups • Initiatives • Activities with other standards bodies MHE - Consultants for Document and Datament Technologies

  50. XML XSL CSS1 & CSS2 DOM HTML MathML PICS PNG RDF SMIL SVG XHTML XPath, XPointer, XML Base, Xlink XML Schema W3C and Standards MHE - Consultants for Document and Datament Technologies

More Related