170 likes | 289 Views
LaTeX as an Archiving Format: Benefits and Problems. Experiences from the Math Diss International Project and the EMANI project. Thomas Fischer, State and University Library Göttingen. Overview. Basis Considerations on File Types Types of file formats Purposes of file formats
E N D
LaTeX as an Archiving Format:Benefits and Problems Experiences from theMathDiss International Projectand the EMANI project Thomas Fischer, State and University Library Göttingen ETD 2003, Berlin
Overview • Basis Considerations on File Types • Types of file formats • Purposes of file formats • Criteria for File Formats for Archiving • File formats in Mathematics • Experiences from MathDiss International • Experiences from the EMANI project • Conclusions ETD 2003, Berlin
Types of file formats • Binary formats Examples: PostScript, PDF, DVI, Word documents • Mark-Up formats Examples: • SGML family: HTML, XML, MathML • Rich Text Format (RTF), Microsoft’s • some Versions of WordPerfect files • TeX family: TeX, LaTeX, AMSTeX ETD 2003, Berlin
Purposes of file formats I • On-screen rendering Display data on screens with different sizes and resolutions Preferably zoomable • Printing Prepare data for printing to different devices like laser printers or bubble jet • Data exchange Transport data from one repository to the other Some sort of error checking is necessary ETD 2003, Berlin
Purposes of file formats II • Discovery and Retrieval Support search functions internally (using the dedicated programme) Allow external text extraction (e.g. for indexing) • Archiving preserve the intellectual contents and the “look and feel” of the file (essentially “eternally”) make it available to the scientific community ETD 2003, Berlin
Criteria for File Formats for Archiving I Deterioration of the storage media not considered: essentially manageable • Error tolerance • Does the change of a single byte make the file unreadable? • Long term stability • Is the file format changing constantly? • Is the dedicated programme “backwards compatible”, I.e. able to read the older files? • Full open specification • Is a full and complete specification of the file format publicly available (at no cost)? ETD 2003, Berlin
Criteria for File Formats for Archiving II • System independence • Does the file require a specific hardware platform or operating system? • Ease of handling • Does the format transform easily to other formats (for delivery)? • Do documents consist of several individual files? • Independence of commercial interests and influence • Is the file format owned by a commercial company • Are the programmes for creating and rendering the format only available from a commercial company? (The same?) • Minor considerations • Bulkiness • Options for navigation • Copy and paste ETD 2003, Berlin
File formats in Mathematics • TeX Basic starting format for writing mathematics • DVI Result of compiling TeX, readable with DVI-viewer in dedicated environment • PostScript Result of transformation from DVI • PDF Result of transformation from DVI or PostScript ETD 2003, Berlin
Experiences from MathDiss International I PostScript Largest single document: • 159 pages • 40.099 KB Compressed: • Zip: 950 KB • StuffIt: 613 KB Portable but bulky format, requires compression for delivery ETD 2003, Berlin
Experiences from MathDiss International II PDF I Largest PDF file: 196 pages, 30.740 KB Compressed: Zip: 5.013 KB, StuffIt: 3.375 KB, StuffItX: 2.514 KB “There was an error when opening this page. The error occurred when analysing an image.” ETD 2003, Berlin
Experiences from MathDiss International III PDF II Rendering: Copying: Â__r¿.á_Å_ÂM_ÉÏ__ßíÓr_LÇvÁiÂÒá ¿._r__Å.¸äßíÅÒË+@ÃÓrÀjå¡ÂÒÀ_¸L<Lá ¿ Å_¸º___RàlÂÄÀA_.Ór_uÂÄÀ_¸.ß_âµãrÂÒ_Rñ.À_Ç.ÂÄÁL_r¸.ß_ß_Â1àlÂÄÀ ôyÂÒâµãL__ÓL_rÇ àrÂÒß#_Ü_.é_ÂÄ_LàlÓL_rÇ_ß__rÀ__.Ár».ÂÄ_Dß_é_ÂÄÀµàlÂÒ_ ETD 2003, Berlin
Experiences from MathDiss International III TeX Multiple files for single document Dedicated environment necessary Often macro (.sty) or other files needed not present Correct files compile and display nicely Most complex bundle: 74 files: eight files with no ending, including three makefiles, one .aux, one .bak, three .bc, one .fot, eight .hex, one .jpeg, two .log, three .make, three .md, one .meta, four .mi, 30 .pre, seven .psi, one .tx. The actual dissertation is hidden in the .bak file. ETD 2003, Berlin
Experiences from the EMANI project Analysis of TeX files from Springer Verlag for different journals. Example: Numerische Mathematik Needed additional files: • svjour.cls: a general class file definition for all Springer Journals • svnummat.clo: a special class option file for “Numerische Mathematik” • TOTAL00.NUM: a somewhat obscure file that “redefines the things for journals to produce totally camera ready output” ETD 2003, Berlin
Conclusions I PDF Advantages: • Easy to handle • Convenient reader Disadvantages: • Files can become very large • Error tolerance is limited • Acrobat system is owned by Adobe and is not open source. ETD 2003, Berlin
Conclusions II TeX Advantages: • Original source • Fairly small • Open source Disadvantages • Needs environment with special files • Multiple files, possibility of missing files ETD 2003, Berlin
Conclusions III General Archive should provide guidance for creation of files (e.g. MathDiss start file) Archive needs “Ingest” system that checks • Completeness of files and successful compilations (TeX) • Possibly crippling adjustments of security settings, rendering quality (PDF) Archive needs management for related complex files and versioning For general acceptance, TeX needs a combined format that can be read using a single reader (e.g. IBM techexplorer) ETD 2003, Berlin
Thank You! Questions, remarks? Thomas Fischer, SUB Göttingenfischer @mail.sub.uni-goettingen.de ETD 2003, Berlin