210 likes | 457 Views
Copy snapshots of the Web from the Internet Archive. Transport the data to Cornell on a regular basis ... the Internet Archive to capture and preserve the content of the Web ...
E N D
Slide 1:The Web Laboratory
Slide 2:The Internet Archive
Slide 3:The Internet Archive Web Collection
Slide 4:The Internet Archive Web Collection
Slide 5:Motivation:Social Science Research
Slide 6:The Petabyte Data Store A project of the Cornell CS database group and the Theory Center
to support research projects that manage large data sets
Physical Gantry
Measure light-scattering properties of objects
Create accurate physical models for graphical rendering
Each dataset is 14TB
Arecibo Telescope
Perform surveys of parts of the sky
Analyze the data to find high red-shift pulsars
1TB/day
The Web Laboratory
Slide 7:Year One System 2 16-Processor Unisys ES7000 Servers
64 GByte RAM
8 GByte/sec aggregate I/O bandwidth
2 50 TByte RAID Online Storage
ADIC Scalar 10K robotic tape library for archive
Slide 8:Unisys Server ES7000/430
Slide 9:RAID Storage System
Slide 10:Web Laboratory
Slide 11:Research Using Web Data
Slide 12:In Memory Web Graph
Slide 13:Research Using Web Data
Slide 14:Storing the Web Data
Slide 16:Benchmarking: the Synthetic Web
Slide 17:Social Science Research
Slide 18:Work Flow System
Slide 19:Current Status
Slide 20:The Cornell Team
Slide 21:Thanks