1 / 11

Steps in archiving of web-publications in NL of Latvia

Steps in archiving of web-publications in NL of Latvia. Ivars Ind ā ns. Tallinn, 24.11. 2005. Chronology (2003). First attempt: 2003: Nedlib: very short budget (~500 EUR); technical problems. Project was cancelled. Chronology (2005). Second attempt: 2005: Heritrix:

zoe
Download Presentation

Steps in archiving of web-publications in NL of Latvia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Steps in archiving of web-publications in NL of Latvia IvarsIndāns Tallinn, 24.11.2005

  2. Chronology (2003) • First attempt: 2003: Nedlib: • very short budget (~500 EUR); • technical problems. • Project was cancelled.

  3. Chronology (2005) • Second attempt: 2005: Heritrix: • gathering of information before starting; • agreement of sharing of the experience with Royal Library of Sweden; • Financing from State program for “Digital Library”. • Project started successfully on September, 2005

  4. Technical infrastructure.Server. • No dedicated server for harvesting; • Dell PowerEdge 1600C Server:- 2,4GHz- 512MB RAM- 2x72GB HDD. • Fedora Linux.

  5. Technical infrastructure.Networking. • Optical line connection to “backbone” of Latvia. • Optical line serves the main workstation cluster of NLLa as well as all servers. • Dramatically increased dataflow from NLLa in 2005: starting the service of Digital Library.

  6. 6 MB/s data speed • What does it means in real life? • Does 600MB (~CD-ROM, one small web-site) will be harvested in 100 seconds? NO!

  7. Back to reality L • Real speed of harvester is slow. • Harvesting of medium size web site takes ~8 hours. • The amount of archived information is very different and unpredictable. • “Full scale” harvesting of sites may overfill the server.

  8. How to improve situation? • Commercial company uses 12 optical lines and “cluster of servers”- no additional info: business is business. • What about NLLa? Solutions:- Improving of Hardware.- Restrictions in harvesting rules: • Limitations in “depth” of harvesting; • Restrictions of file types. I don’t know the best solution. Your opinions?

  9. Thank you for patience. Ivars Indānsivars.indans@lnb.lv

More Related