1 / 12

Technology Choices for the JSTOR Online Archive

This article discusses the technology choices and implementation strategies for the JSTOR Online Archive, a digital library of scholarly materials. It covers aspects such as storage, searchability, delivery, and server infrastructure.

hardiman
Download Presentation

Technology Choices for the JSTOR Online Archive

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO 65211

  2. Reference • Technology Choices for the JSTOR Online Archive, S. W. Thomas, K. Alexander, and K. Guthrie, Computer (February 1999), 60-65.

  3. JSTOR Overview • Goals: To increase access to older scholarly materials by converting them to digital media and providing a full-text search capability. • Benefits: Preservation of the original documents and conserving library shelf space. • Development phases: • Phase-I (scheduled for completion by the end of 1999): minimum of 100 journal titles, primarily in the humanities and social sciences. • As of December 1998: 67 journal titles, total 450,000 articles and 2.7 million pages.

  4. Implementation JSTOR • Principles • Let mission guide technical choices. • User first. • Issues to be addressed when building the digital library • Formats (e.g., image v.s. formatted text) • Storage, display and distribution technologies (e.g., CD-ROM v.s. Internet)

  5. Implementing JSTOR • Mission: A reliable and faithful electronic archive • Choice of technology: Scanned-in image at 600 dpi for each page. • Mission: Searchable • Choice of technology: Use OCR software to create text files that would let the user search journals’ full text. • Mission: Reduce long-term library costs • Choice of technology: Database storage centralized, with distribution over the Internet.

  6. Delivering JSTOR Pages • Deliver in GIF format: ~30 Kbytes/page. • Converts page to screen resolution as needed. • System caches converted pages for 3-4 days. • Deliver one page at a time with next page pre-loading. • Print entire article: ( at 600 or 150 dpi resolution ) • JPrint as a separate application (faster) • Adobe Acrobat files • PostScript files

  7. Searching JSTOR • Graphic searching interface. • Stores the full text in one file per page. • Each article also contains a citation file. • Text files have embedded tags that specify which parts of the text belong to which article. • Separate index for each journal title. • Articles are indexed using Full-Text Lexicographer (U. of Michigan): • Allow dynamic updating (no index down time). • Periodically optimizing index with no down time.

  8. Browser Interoperability • Major issue: Back compatibility. • Support HTML 3.2 standard • JSTOR interface uses frame, but can adjust itself automatically to an unframed interface. • Use new technology to enhance functionality, but not to provide basic functionality. • Plug-ins not encouraged.

  9. JSTOR Server Infrastructure • Storage: • Online: 600 dpi TIFF page images compressed with Cartesian Perceptual Compression (1:4, CPI Inc.). • Offline: multiple copies of the original TIFF images for archival purposes. • Performance: • Replacing CGI programs with FastCGI or Java servlets. • Server mirroring

  10. Issues of Server Mirroring • Mirror server load balancing: Currently using a round-robin method. • Mirror server synchronization: Currently, new release (> 1 GB/month) are shipped overnight on magnetic tape to mirror sites. • User state synchronization: Currently, • Regenerate the data at the current server if possible, or • Current server request information from the server that originally created it and caches that copy for future use.

  11. Authentication • Cross organization access management • JSTOR currently rely on participating institutions to supply with authenticated IP address. • Under evaluation: • digital certificates issued by the participating institutions. • password-based access control.

  12. Conclusions • The choice of technology is based on the mission of the project and user feedback. • Must remain flexible.

More Related