70 likes | 171 Views
Australian web domain harvests 2005, 2006 & 2007. Igor Ranitovic Internet Archive engineer With Petabox rack For Australian domain harvest. PANDORA : Domain Harvesting. Australian domain harvest .au domain, located on Australian servers Internet Archive 1 st harvest June/July 2005
E N D
Igor Ranitovic Internet Archive engineer With Petabox rack For Australian domain harvest
PANDORA : Domain Harvesting • Australian domain harvest • .au domain, located on Australian servers • Internet Archive • 1st harvest June/July 2005 • 4 weeks, 185m files, 6.69 TBs • 2nd harvest Aug/Sept 2006 • 5 weeks, 596m files, 19.04 TBs • 3rd harvest Aug/Sept 2007 • 4 weeks, 516m files, 18.47 TBs
Comparative statistics PANDORA DomainHarvests
PANDORA : Domain Harvesting • Some pros – • Retains linkages and context • Large scale – more bytes for the buck • Less selectively discriminate • Some cons – • High dependence on the crawler technology • Domain and geo-location bias (.au, geoIP) • Limitations in timeliness, quality assurance, scoping, site complexity, deep web • Legal and access issues to resolve
PANDORA : Australia’s Web Archive • Enormous growth and volume of material • Everyone can be creators and publishers • Virtually instantaneous publication • Dynamic content and format • Multiplicity of formats • Technology dependent • Hyperlinked and interconnected • Highly accessible but hard to identify • Ephemeral • Interactivity, re-use, personalisation (web 2.0)