1 / 17

LaTeX as an Archiving Format: Benefits and Problems

LaTeX as an Archiving Format: Benefits and Problems. Experiences from the Math Diss International Project and the EMANI project. Thomas Fischer, State and University Library Göttingen. Overview. Basis Considerations on File Types Types of file formats Purposes of file formats

Download Presentation

LaTeX as an Archiving Format: Benefits and Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LaTeX as an Archiving Format:Benefits and Problems Experiences from theMathDiss International Projectand the EMANI project Thomas Fischer, State and University Library Göttingen ETD 2003, Berlin

  2. Overview • Basis Considerations on File Types • Types of file formats • Purposes of file formats • Criteria for File Formats for Archiving • File formats in Mathematics • Experiences from MathDiss International • Experiences from the EMANI project • Conclusions ETD 2003, Berlin

  3. Types of file formats • Binary formats Examples: PostScript, PDF, DVI, Word documents • Mark-Up formats Examples: • SGML family: HTML, XML, MathML • Rich Text Format (RTF), Microsoft’s • some Versions of WordPerfect files • TeX family: TeX, LaTeX, AMSTeX ETD 2003, Berlin

  4. Purposes of file formats I • On-screen rendering Display data on screens with different sizes and resolutions Preferably zoomable • Printing Prepare data for printing to different devices like laser printers or bubble jet • Data exchange Transport data from one repository to the other Some sort of error checking is necessary ETD 2003, Berlin

  5. Purposes of file formats II • Discovery and Retrieval Support search functions internally (using the dedicated programme) Allow external text extraction (e.g. for indexing) • Archiving preserve the intellectual contents and the “look and feel” of the file (essentially “eternally”) make it available to the scientific community ETD 2003, Berlin

  6. Criteria for File Formats for Archiving I Deterioration of the storage media not considered: essentially manageable • Error tolerance • Does the change of a single byte make the file unreadable? • Long term stability • Is the file format changing constantly? • Is the dedicated programme “backwards compatible”, I.e. able to read the older files? • Full open specification • Is a full and complete specification of the file format publicly available (at no cost)? ETD 2003, Berlin

  7. Criteria for File Formats for Archiving II • System independence • Does the file require a specific hardware platform or operating system? • Ease of handling • Does the format transform easily to other formats (for delivery)? • Do documents consist of several individual files? • Independence of commercial interests and influence • Is the file format owned by a commercial company • Are the programmes for creating and rendering the format only available from a commercial company? (The same?) • Minor considerations • Bulkiness • Options for navigation • Copy and paste ETD 2003, Berlin

  8. File formats in Mathematics • TeX Basic starting format for writing mathematics • DVI Result of compiling TeX, readable with DVI-viewer in dedicated environment • PostScript Result of transformation from DVI • PDF Result of transformation from DVI or PostScript ETD 2003, Berlin

  9. Experiences from MathDiss International I PostScript Largest single document: • 159 pages • 40.099 KB Compressed: • Zip: 950 KB • StuffIt: 613 KB Portable but bulky format, requires compression for delivery ETD 2003, Berlin

  10. Experiences from MathDiss International II PDF I Largest PDF file: 196 pages, 30.740 KB Compressed: Zip: 5.013 KB, StuffIt: 3.375 KB, StuffItX: 2.514 KB “There was an error when opening this page. The error occurred when analysing an image.” ETD 2003, Berlin

  11. Experiences from MathDiss International III PDF II Rendering: Copying: Â__r¿.á_Å_ÂM_ÉÏ__ßíÓr_LÇvÁiÂÒá ¿._r__Å.¸äßíÅÒË+@ÃÓrÀjå¡ÂÒÀ_¸L<Lá ¿ Å_¸º___RàlÂÄÀA_.Ór_uÂÄÀ_¸.ß_âµãrÂÒ_Rñ.À_Ç.ÂÄÁL_r¸.ß_ß_Â1àlÂÄÀ ôyÂÒâµãL__ÓL_rÇ àrÂÒß#_Ü_.é_ÂÄ_LàlÓL_rÇ_ß__rÀ__.Ár».ÂÄ_Dß_é_ÂÄÀµàlÂÒ_ ETD 2003, Berlin

  12. Experiences from MathDiss International III TeX Multiple files for single document Dedicated environment necessary Often macro (.sty) or other files needed not present Correct files compile and display nicely Most complex bundle: 74 files: eight files with no ending, including three makefiles, one .aux, one .bak, three .bc, one .fot, eight .hex, one .jpeg, two .log, three .make, three .md, one .meta, four .mi, 30 .pre, seven .psi, one .tx. The actual dissertation is hidden in the .bak file. ETD 2003, Berlin

  13. Experiences from the EMANI project Analysis of TeX files from Springer Verlag for different journals. Example: Numerische Mathematik Needed additional files: • svjour.cls: a general class file definition for all Springer Journals • svnummat.clo: a special class option file for “Numerische Mathematik” • TOTAL00.NUM: a somewhat obscure file that “redefines the things for journals to produce totally camera ready output” ETD 2003, Berlin

  14. Conclusions I PDF Advantages: • Easy to handle • Convenient reader Disadvantages: • Files can become very large • Error tolerance is limited • Acrobat system is owned by Adobe and is not open source. ETD 2003, Berlin

  15. Conclusions II TeX Advantages: • Original source • Fairly small • Open source Disadvantages • Needs environment with special files • Multiple files, possibility of missing files ETD 2003, Berlin

  16. Conclusions III General Archive should provide guidance for creation of files (e.g. MathDiss start file) Archive needs “Ingest” system that checks • Completeness of files and successful compilations (TeX) • Possibly crippling adjustments of security settings, rendering quality (PDF) Archive needs management for related complex files and versioning For general acceptance, TeX needs a combined format that can be read using a single reader (e.g. IBM techexplorer) ETD 2003, Berlin

  17. Thank You! Questions, remarks? Thomas Fischer, SUB Göttingenfischer @mail.sub.uni-goettingen.de ETD 2003, Berlin

More Related