470 likes | 580 Views
Traditional Electronic Printing On The Internet. William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE. Xplor 21st Global Conference and Exhibit Miami Beach, Florida October 30, 2000. Printing Versus The Internet. Printing Versus The Internet.
E N D
Traditional Electronic PrintingOn The Internet William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE MHE - Consultants for Document and Datament Technologies
Xplor 21st Global Conference and Exhibit Miami Beach, Florida October 30, 2000 MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet • Electronic printing is an $125,000,000,000 (US) industry worldwide (www.xplor.org) • There are now an estimated 98,685,000 host computers on the Internet (www.mids.org) • Xplor International estimates that the production of paper documents and electronic documents is still increasing • So, for a while yet, we’re living in a hybrid world MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet • Customer service needs identical look and feel in paper and electronic documents • Regulatory agencies continue to have an interest in document presentation • Customers need a re-education process as documents change media • Hence, there are good reasons in the short run to be concerned about presentation MHE - Consultants for Document and Datament Technologies
The Nature Of Print Streams MHE - Consultants for Document and Datament Technologies
EBCDIC Versus ASCII • BCD - Binary Coded Decimal • BCDIC - Binary Coded Decimal Interchange Code • EBCDIC - IBM Extended Binary Coded Decimal Interchange Code • ASCII - American Standard Code for Information Interchange MHE - Consultants for Document and Datament Technologies
EBCDIC Line Data • EBCDIC encoded - 8 bit • Record-oriented because of IBM OS’s • Carriage controls • Machine carriage controls • ANSI carriage controls MHE - Consultants for Document and Datament Technologies
ASCII Line Data • ASCII encoded - 7 bit • ‘Record’ orientation is not intrinsic to OS • Text files use print controls to delimit records • Common print controls • x’0d’ carriage return • x’0a’ line feed • x’0c’ form feed MHE - Consultants for Document and Datament Technologies
The EBCDIC Family Tree • EBCDIC text • 1403 data - EBCDIC records with a carriage control • LCDS - ‘Line conditioned’ data stream • 3800 Mod I • 3211 data with Xerox DJDEs • Others • AFP, MO:DCA, and IPDS MHE - Consultants for Document and Datament Technologies
The ASCII Family Tree • ASCII text • ASCII text with print controls • ASCII text with escape sequences Epson MX-80 Xerox UDK (XES) QMS QUIC IBM PPDS HP PCL Xerox Metacode • Print programming languages using ASCII Interpress PostScript MHE - Consultants for Document and Datament Technologies
1403, 3211, other EBCDIC line data streams, including Xerox DJDE 3800 Mod I and other IBM data streams ASCII text files of all sorts 1 This is text F44444E88A48A4A8AA 100000389209203573 FCL F This is textRF 02222256672672767700 C00000489309304584DA Line Data And Conditioned Line Data MHE - Consultants for Document and Datament Technologies
Epson and many other impact printers Xerox UDK (XES) QMS QUIC IBM PPDS HP PCL Xerox Metacode AFP, MO:DCA, and IPDS X’01060001040002000154686973206973207465787401’ AMB 100 AMI 300 STO 0,90 SCFL 3 SVI 14 TRN “This is text” Print Data With Escape Sequences MHE - Consultants for Document and Datament Technologies
Interpress PostScript (and PDF) %!PS-Adobe-2.0 %%Title: Blue Book Program 7, on page 157 %%EndComments/Times-Roman findfont 18 scalefont setfont 72 500 moveto (This is text) show ... Print Programming Languages MHE - Consultants for Document and Datament Technologies
The Nature Of Internet Formats MHE - Consultants for Document and Datament Technologies
Common Internet Formats • The most commonly used data format on the Internet is HTML - HyperText Markup Language • The next expected wave on the Internet is XML (eXtensible Markup Language) and its related standards such as XSL, SVG, etc. • As a secondary standard, PDF is widely used to present static documents MHE - Consultants for Document and Datament Technologies
HTML • HTML is an instance of SGML • HTML has a set of 40 to 50 tags, which are “grammar” based • HTML tags have default presentation characteristics, but these can be overridden with CSS (Cascading Style Sheets) MHE - Consultants for Document and Datament Technologies
Sample HTML <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <h1>Poison Ivy Vineyards</h1> <p>Poison Ivy Vineyards is an experiment in growing wine-quality grapes in a backyard in a residential neighborhood in Richardson, Texas. This website serves as a running diary of the steps I took to create the vineyard and - eventually - to make wine.</p> </html> MHE - Consultants for Document and Datament Technologies
XML • XML is eXtensible Markup Language, which means that you can make up the tags • Since a browser can’t know how to format the tags, default formatting is in outline form • Normally, you would use XSL (CSS) to describe how each tag is to be formatted MHE - Consultants for Document and Datament Technologies
Sample XML <NAME>William J. "Bill" McCalpin, EDPP, CDIA, MIT, LIT</NAME> <JOBTITLE>Principal</JOBTITLE> <AFFILIATION>MHE</AFFILIATION> <ADDRESS> <STREET>1400 Cheyenne Dr.</STREET> <CITY>Richardson</CITY> <STATE>Texas</STATE> <ZIPCODE>75080</ZIPCODE> <EMAIL>mccalpin@mhe-consulting.com</EMAIL> </ADDRESS> MHE - Consultants for Document and Datament Technologies
Sample XSL This is an <emph>important</emph> point. <xsl:template match="emph”> <fo:sequence font-weight="bold”> <xsl:process-children/> </fo:sequence> </xsl:template> MHE - Consultants for Document and Datament Technologies
PDF • PDF is Adobe’s Portable Document Format • PDF is a print stream, not an SGML instance • PDF is similar to PostScript, but more portable, because it carries its own resources • PDF provides good fidelity, at a price MHE - Consultants for Document and Datament Technologies
Sample PDF %PDF-1.1 ... 2 0 obj << /CreationDate (D:19960809191047) /Producer (Acrobat Distiller 2.1 for Windows) /Creator (Adobe PageMaker 6.0) /Author (Doc) /Keywords () /Title (bills) /Subject () >> endobj MHE - Consultants for Document and Datament Technologies
Limits Of Browsers MHE - Consultants for Document and Datament Technologies
A Normal HTML Page MHE - Consultants for Document and Datament Technologies
Default Font Increased MHE - Consultants for Document and Datament Technologies
Using Ghouly Solid MHE - Consultants for Document and Datament Technologies
Adjusting The Fonts MHE - Consultants for Document and Datament Technologies
Methods Of Moving Traditional Electronic Print To The Internet MHE - Consultants for Document and Datament Technologies
Five Methods • Conversion to PDF • Rasterization to gif or jpeg • Recomposition into HTML/XML • “Conversion” to normal HTML/XML • Translation to highly formatted HTML/XML MHE - Consultants for Document and Datament Technologies
Conversion to PDF • This is a print stream to print stream conversion • The output in PDF usually looks very similar to the original printed document • Many tools which create the PDF also add value, such as hypertext links, bookmarking, et cetera, to the PDF document MHE - Consultants for Document and Datament Technologies
Pros And Cons Of PDF • Pros • High fidelity to original document • Reader is widespread and free • Reasonably transportable • Widely used in some circles (e.g., IRS) • Cons: • PDF files tend to be large • PDF documents are paper-sized centric • Browser requires a “plug-in”* MHE - Consultants for Document and Datament Technologies
%PDF-1.1 ... 2 0 obj << /CreationDate (D:19960809191047) /Producer (Acrobat Distiller 2.1 for Windows) /Creator (Adobe PageMaker 6.0) /Author (Doc) /Keywords () /Title (bills) /Subject () >> endobj PDF Sample MHE - Consultants for Document and Datament Technologies
Sources For * To PDF • Composition Tools - create new PDF documents from source code • Transforms - translate existing formatted print streams into PDF • Larger Systems- composition or translation capabilities inserted transparently into document systems • See Xplor Products and Services Reference Guide MHE - Consultants for Document and Datament Technologies
Rasterization to gif or jpeg • The print stream is”rasterized”, that is, converted to a bit map format • GIF: Graphical Interchange Format (GIF) - Invented by CompuServe for graphics. Supports only 256 colors, or 8 bits. • JPEG (Joint Photographic Experts Group) Specifically for more than 256 colors, with better compression, but is “lossey” • Excellent discussion of each at http://www.efuse.com/Design/web_graphics_basics.html MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Rasterization • Pros: • Image is exact copy of original document • Image can be viewed on any browser which takes gifs and jpegs • Cons: • Resolution is hardcoded at one size • There’s no text to search • Download is longer • No correspondence of printed pages and “HTML” pages MHE - Consultants for Document and Datament Technologies
Sample Rasterization • This page was originally created in PDF, then rasterized, and converted to a jpeg MHE - Consultants for Document and Datament Technologies
Recomposition into HTML/XML • Data is extracted from a print stream • Templates have been created in advance • The extracted data is merged into the templates • There may be fewer or more output pages in HTML than were in the print stream • Templates are built to be the most effective in the browser window MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Recomposition • Pros: • HTM/XMLL pages are well-suited for the browser • HTML/XML is considered by some to be simpler than PDF • Cons: • HTML/XML pages don’t necessarily match the printed pages • All pages (templates) must be pre-composed MHE - Consultants for Document and Datament Technologies
This document is a sample telephone bill which have been divided into 11 HTML pages Note how the HTML pages are divided by subject, not by page overflow Sample Recomposition MHE - Consultants for Document and Datament Technologies
“Conversion” to normal HTML/XML • Both data and formatting information are extracted from the print file • Some formats easily correspond to an HTML tag, e.g., a heading to <h1> • More complex formatting can be approximated by the use of table tags MHE - Consultants for Document and Datament Technologies
Pros And Cons of “Conversion” • Pros: • HTML/XML pages look similar to printed pages • Pages are in HTML/XML, not PDF or raster • Cons: • Fidelity is approximate • Reader can substantially alter the presentation • Graphics may not be supported MHE - Consultants for Document and Datament Technologies
Sample “Conversion” MHE - Consultants for Document and Datament Technologies
Translation to highly formatted HTML/XML • This method uses particular CSS commands to do “exact” placement of text in the window of the browser • This is as close as XML gets (today) to being a print stream • Fonts are still subject to user override MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Translation • Pros: • Author has very good control over the presentation of text • Cons: • Much of the value of a tagged language is lost • Portrait print pages still don’t fit on landscape browser windows • May not work with all browsers • Fonts can still be overridden MHE - Consultants for Document and Datament Technologies
Sample Translation • <HTML> • <HEAD> • .ps9{position:absolute;top:676px;left:454px;width:65px;} • .ps10{position:absolute;top:676px;left:535px;width:66px;} • .ps11{position:absolute;top:676px;left:1102px;width:70px;} • <SPAN CLASS="ps9"><NOBR>Balance</NOBR></SPAN> • <SPAN CLASS="ps10"><NOBR>Forward</NOBR></SPAN> • <SPAN CLASS="ps11"><NOBR>5,000.00</NOBR></SPAN> MHE - Consultants for Document and Datament Technologies
William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE 1400 Cheyenne Dr. Richardson, Texas 75080-3921 972-231-3660 (v) 972-690-4521 (f) mccalpin@mhe-consulting.com MHE - Consultants for Document and Datament Technologies