150 likes | 286 Views
Science Archives @ ESAC Cluster Final Archive. Pedro Osuna Head of the Science Archives and VO Team Science Operations Department CAA-CFA Review Meeting ESTEC, 18-19 of May 2011. European Space Astronomy Centre. ESAC default location for: Science operations,
E N D
Science Archives @ ESAC Cluster Final Archive Pedro Osuna Head of the Science Archives and VO Team Science Operations Department CAA-CFA Review Meeting ESTEC, 18-19 of May 2011
European Space Astronomy Centre • ESAC default location for: • Science operations, • long history with astronomical missions • Now also solar system missions • Science archives, • Astronomy, Planetary and Solar Systems • Long term preservation of ESA Science data • ESA Virtual Observatory activities, • ESAC is the European VO node for space-based astronomy. http://www.esa.int/SPECIALS/ESAC/ Located near Madrid, Spain ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 2
2011 2009 2014+ 1998 2002 2009 2009 2005 Astronomy Archives at ESAC (current) ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 3
2004 2010 2009 2005 2009 2006 2011 2010 Solar Systems Archives at ESAC ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 4
Science Archives Team @ ESAC • More than 13 years experience over many different missions • Astronomy / planetary /solar systems, observatory / survey / PI missions • Mission in development, in operations, in post operations, in archive phases • Raw data, calibrated processed data, high level data products • Data processed at ESAC by SOC or by PI teams • Standard processing, bulk reprocessing, on-the-fly reprocessing • Financed by projects funds (~11 FTE) as part of their SOC activities • Some SRE-O department (~3 FTE) for core archive activities (eg “re-engineering”), maintaining archives long term (ie ISO, Exosat) and small VO activities • ESAC Science Archives Team highly involved in all VO activities within ESA ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 5
Science Archives @ ESAC : main characteristics • Complete Mission Scientific data all stored on hard disks • Images, spectras, measurements (photons counts, temperatures, wind speeds, …), catalogues or maps of astronomical objects, … • Around 50TB of science data, increase quickly to 100-200 TB, then ~1 PB with Gaia • New data ingested regularly • All archives data management systems automatic • All data is distributed through Internet/FTP • No archive operator • Development, maintenance, operations, monitoring done by Science Archives Team • Exception for some complex data ingestion • eg Planetary missions requiring technical validation, • one off ingestion of point sources catalogue • Data is made available to the scientific community through Internet • Through a standard Internet browser (Internet Explorer, Firefox, …) • Search, preview, select and download • Public access after a proprietary period ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 6
Archive and Data Management • OAIS Reference model for Open Archival Information System • Recommendation for Space Data System Standards • An archive is more than a “JBOD” (Just a Bunch Of Disks) • Dozens of TB, millions of files, complex queries • Proper software engineering is require to ensure the various OAIS functions • Powerful and user-friendly access interfaces • Complex Data and Metadata Management • Well Modeled Databases • Flexible Data Distribution • Interoperability • Added Value Services • Logging, Statistics • Long Term Preservation ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 7
ESAC Archives Common Architecture ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 8
Archives Building System Infrastructure (ABSI) • All new ESAC Science Archives are based on ABSI • 1st generation of ESAC Archives are being re-engineered into the ABSI framework • The ABSI provides with a set of Building Blocks, modular enough to be reusable for different purpose Scientific Archives (c.f., SOHO, EXOSAT, Planck) • These Building Blocks are divided in: • Interfaces: • Object Oriented type of Interface • Modules: • It's a software package that packs self-contained functionality. Must be accompanied by an API or similar that gives information on how to consume it. • Component sample: • Wraps-up sample code implementing certain functionality. It may contain Modules and/or GlueCode. ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 9
ABSI Elements ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 10
Handling big amount of data • We are dealing with more than a million observations (granularity in SOHO different from Astronomical cases). Database Table indexing gets overly complicated and joins poorly performant • To know how to apply the joins to the different attributes requested ("where" part of the query), we implement the Dijkstra algorithm (shortest path algorithm, graph theory). • Dijkstra's algorithm, conceived by Dutch computer scientist Edsger Dijkstra in 1959, is a graph search algorithm that solves the single-source shortest path problem for a graph with non negative edge path costs, outputting a shortest path tree. • This algorithm is often used in routing. • We have applied it to our database tables and relationships • On-line examples: • http://www.carto.net/papers/svg/dijkstra_shortest_path_demo/ ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 11
Indexing spherical data in databases • Indexing spherical data for search in a DB is traditional problem • Coordinate searches done “a capella” in our archives -> plane SQL searches with indices in RA, DEC. • Gets complicated when coordinate operations, transformations, functions, etc. have to be executed -> poor performance, low flexibility • PgSphere is a module that implements Spherical types (database types) in PostgreSQL (an open source database system). • Provides: • input and output of data • containing, overlapping, and other operators • various input and converting functions and operators • circumference and area of an object • spherical transformation • indexing of spherical data types • several input and output formats • Implemented in Exosat for the first time. Being re-used for Planck, Herschel. ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 12
New ESAC Science Archive Creation Process • Bottom-up approach: First build the general UML for the overall project. Then start building from there. • UML DB Design Repository design DAO (Data Access Objects) design User Interface design • Good UML design for project extremely important. • Proper knowledge of the data by the SAT is crucial in order to build good Data repository and Data Distribution systems hundreds of mails interchanged between SOHO Archive Scientist and SAT Team on Data issues) ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 13
ESAC Archives Common ArchitectureTechnologies used Java Rich Client Webstart InfoNode JGoodies Swing PostgreSQL PgSphere NAS, NFS NetApp Filer Web Services Java Tomcat Spring Java Hibernate Java ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 14
ESAC Archives Interfaces • Complementary interfaces to serve various type of users • Scientific Community (public access), PI teams and observers (controlled access), Science Operations Team (privilege access) • Powerful web based Java GUI interface • Standard access to the archives • Simple to use, powerful search / results facilities • Handling of proprietary and public data • Direct download, shopping basket • On the fly reprocessing (for some archives) • Link (back and forth) to the science literature • Mars Map Browser • Scriptable Interface • Machine interface, mainly used by Science Operations Teams and mirror sites • Interoperability with other external archives and tools through Virtual Observatory protocols http://archives.esac.esa.int/ ESAC Science Archives- Cluster Final Archive| P. Osuna | CAA-CFA Review Meeting 18-19 May 2011 | Pag. 15