220 likes | 302 Views
From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories. R. Chris Smith NOAO/CTIO, LSST. Challenges for the Operational VO. Providing Content
E N D
From Photons to Petabytes:Astronomy in the Era ofLarge Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST eScience May 2007
Challenges for the Operational VO • Providing Content • capturing and archiving data from diverse instruments, AND capturing metadata (system & science) to make that data useful • Providing Access • implementing the VO standards and services, plus network infrastructure, needed for wide access to the content • Ensure not only access, but long-term support and documentation of datasets & metadata (curation) • Providing User Interfaces and Tools • developing and operating user interfaces which enable effective scientific use of ALL of the distributed resources of the VO eScience May 2007
A Case Study:NOAO Data Management • Management of data from all NOAO and some affiliated facilities = CONTENT • 3 mountaintops (Cerro Tololo, Cerro Pachon, Kitt Peak) • 11 telescopes • More than 30 instruments • Virtual Observatory “back end” = ACCESS • Provide effective access to large volume (TBs to PBs) of archived ground-based optical & infrared data and data products through VO standard interfaces and networks • Virtual Observatory “front end” = UI and TOOLS • Enable science by developing VO user interfaces, tools, and services to work with distributed data sources and large volumes of data eScience May 2007
Data Management Serving VO Content UI & Tools eScience May 2007
BIG Question: • How does this model SCALE? • Capturing, moving, & processing the data • Making the data AVAILABLE through VO interfaces • Making the data USEFUL for scientific analysis • Why do we worry about scaling? eScience May 2007
Turning Photonsinto Petabytes • Today • MOSAIC, WFI, IMACS: • 64 Mpix cameras • ~10 to 20 GB/night • Builds up quickly! • in only 3 years of two MOSAIC cameras • ~20TB raw data • ~40-60TB processed IMACS image, Las Campanas Observatory (Danny Steeghs, Jan'04) eScience May 2007
Coming Soon: Dark Energy Camera • Focal Plane: • 64 2K x 4K detectors • Plus guiding and WFS • 530 Mpix camera eScience May 2007
The Data:Dark Energy Survey • Each image = 1GB • 350 GB of raw data / night • Data must be moved to supercomputer center (NCSA) before next night begins (<24 hours) • Need >36Mbps internationally • Data must be processed within ~24 hours • Need to inform next night’s observing • Total raw data after 5 yrs ~0.2 PB • TOTAL Dataset 1 to 5 PB • Reprocessing planned using TeraGrid resources eScience May 2007
LSST: The Large Synoptic Survey Telescope Survey the entire sky every 3 to 5 nights, to simultaneously detect and study: • Dark Matter via Weak gravitational lensing • Dark Energy via thousands of SNe per year • Potentially hazardous near earth asteroids • Tracers of the formation of the solar system • Fireworks in the heavens – GRBs, quasars… • Periodic and transient phenomena • ...…the unknown Massively PARALLEL Astronomy eScience May 2007
LSST: The Instrument • 8.2m telescope • Optimized for WIDE field of view • 3.5 degree FOV • 3.5 GIGApixel camera • Deep images in 15s • Able to scan whole sky every 3 to 5 nights eScience May 2007
LSST LSST: Deep, Wide, Fast Field of view (FOV) 0.2 degrees 10 m 3.5 degrees Keck Telescope eScience May 2007
LSST ~1.5m cal telescope Support LSST site plan LSST Site: Cerro Pachon, Chile Gemini (South) SOAR Gemini El Penon Soar eScience May 2007
LSST: Distributed Data Mgmt Long-Haul Communications Data transport & distribution Archive/Data Access Centers Data processing, long term storage, & public access Mountain Site data acquisition, temp. storage Base Facility Real time processing eScience May 2007
LSST: The Data Flow All Data Public • Each image roughly 6.5GB • Cadence: ~1 image every 15s • 15 to 18 TB per night • ALL must be transferred to U.S. “data center” • Mtn-base within image timescale (15s), ~10-20Gbps • Internationally within <24 hours, >2-10Gbps • REAL TIME reduction, analysis, & alerts • Send out alerts of transient sources within minutes • Provide automatic data quality evaluation, alert to problems • Processed data grows to >100TB per night! • Just catalogs = Petaybytes per year! All Alerts Public eScience May 2007
Archive Center Base Data Access Center LSST Needs eScience May 2007
Turning Photonsinto Petabytes: Summary • Today, ~10 to 20 GB/night • MOSAIC, WFI, IMACS: 64 Mpix cameras • Soon, ~300 to 500 GB/night • VISTA: 67 Mpix camera • VST: 256 Mpix camera • DECam/DES: 520 Mpix camera • On the horizon, ~15 TB/night • LSST Project: 3 Gpix camera And these are just survey instruments in Chile! eScience May 2007
DES, LSST, … the REST of the Science? • Ongoing (MOSAIC, WFI, IMACS) and future (DES, LSST, etc.) projects will provide PETABYTES of archived data • Only a small fraction of the science potential will be realized by the planned investigations • How do we maximize the investment in these datasets and provide for their future scientific use? eScience May 2007
VO ChallengesProvider Perspective • How do we effectively capture, transport, and manage Petabytes of data? • Need advanced IT infrastructure • How do we provide effective access to Petabytes of data? • Need advanced data mining interfaces • Fundamentally IT challenges, in support of the astronomical community eScience May 2007
VO ChallengesScientific Perspective • Data Discovery • From those Petabytes, what data exists that might be useful to help address my scientific query? • Data Understanding • Which data are best suited for my analysis? • Data Movement • How do I get the data from where it is to where it is most useful? • Data Analysis • How do I extract the information I need from the data? eScience May 2007
NVO portal @ NOAO • Focus on Scientific USER • 4 Keys: Data Discovery, Data Understanding, Data Access, Data Analysis • First focus on supporting data DISCOVERY • Discovery in spatial coordinates: NOAO Sky • Discovery in temporal coordinates: Timeline • NOAO NVO portals: • http://nvo.noao.edu • And for South America… • http://nvo.ctio.noao.edu • Foundation for exploring partnerships with S.A. communities eScience May 2007
Summary:VO Challenges • In Infrastructure • Collect and maintain petabytes of content • Provide for effective access, including networks, hardware, and software • In User Interaction • Provide effective user interfaces • Support distributed analysis • Support large queries across distributed DBs • Support statistical analysis and processing across distributed resources (Grid processing & storage) • TOOLS & SERVICES to enable SCIENCE eScience May 2007
How?Strategic Partnerships • In Local Systems • Vendors: Local Storage, Processing, Servers • In Remote Systems • Distributed computer centers to provide bulk storage, large scale processing • Linked together for Grid processing, Grid storage • In Connectivity • High-speed national and international bandwidth • Scientific • VO Partners to develop standards, provide tools (IVOA) • Developing tools and services optimized for scientific analysis over large datasets (e.g., statistical methods) eScience May 2007