110 likes | 245 Views
Steps in archiving of web-publications in NL of Latvia. Ivars Ind ā ns. Tallinn, 24.11. 2005. Chronology (2003). First attempt: 2003: Nedlib: very short budget (~500 EUR); technical problems. Project was cancelled. Chronology (2005). Second attempt: 2005: Heritrix:
E N D
Steps in archiving of web-publications in NL of Latvia IvarsIndāns Tallinn, 24.11.2005
Chronology (2003) • First attempt: 2003: Nedlib: • very short budget (~500 EUR); • technical problems. • Project was cancelled.
Chronology (2005) • Second attempt: 2005: Heritrix: • gathering of information before starting; • agreement of sharing of the experience with Royal Library of Sweden; • Financing from State program for “Digital Library”. • Project started successfully on September, 2005
Technical infrastructure.Server. • No dedicated server for harvesting; • Dell PowerEdge 1600C Server:- 2,4GHz- 512MB RAM- 2x72GB HDD. • Fedora Linux.
Technical infrastructure.Networking. • Optical line connection to “backbone” of Latvia. • Optical line serves the main workstation cluster of NLLa as well as all servers. • Dramatically increased dataflow from NLLa in 2005: starting the service of Digital Library.
6 MB/s data speed • What does it means in real life? • Does 600MB (~CD-ROM, one small web-site) will be harvested in 100 seconds? NO!
Back to reality L • Real speed of harvester is slow. • Harvesting of medium size web site takes ~8 hours. • The amount of archived information is very different and unpredictable. • “Full scale” harvesting of sites may overfill the server.
How to improve situation? • Commercial company uses 12 optical lines and “cluster of servers”- no additional info: business is business. • What about NLLa? Solutions:- Improving of Hardware.- Restrictions in harvesting rules: • Limitations in “depth” of harvesting; • Restrictions of file types. I don’t know the best solution. Your opinions?
Thank you for patience. Ivars Indānsivars.indans@lnb.lv