300 likes | 441 Views
e-Infrastructure across Photon and Neutron Sources. Brian Matthews brian.matthews@stfc.ac.uk Scientific Computing Department STFC Rutherford Appleton Laboratory. STFC. Formed by Royal Charter in 2007, the Science and Technology Facilities Council
E N D
e-Infrastructure across Photon and Neutron Sources Brian Matthews brian.matthews@stfc.ac.uk Scientific Computing Department STFC Rutherford Appleton Laboratory
Formed by Royal Charter in 2007, the Science and Technology Facilities Council is one of Europe's largest multidisciplinary research organisations supporting scientists and engineers world-wide. The Council operates world-class, large scale research facilities and provides strategic advice to the UK government on their development. The Council fundsuniversity research in particle physics, nuclear physics, astronomy and space.
UK ATC James Clerk Maxwell Telescope Hawaii RAL: Diamond Daresbury Lab CERN: LHC Rutherford Appleton Lab ILL and ESRF ESA: Top Sat Chilbolton Lab ESO: Alma Array
The data centre at RAL • RAL Base of Scientific Computing The scientific data centre at RAL • UK LHC Tier 1 - ~20Pb data • EMERALD - Largest production GPU facility in UK: 372 Nvidia Telsa M2090 GPUs • Jasmin/CEMS – 4Pb of climate modelling data • SCARF – >2200 core cluster for simulation
Facilities and Resources of The Hartree Centre • Scientific Computing Department at Daresbury Laboratory • Projects and codes developed on state of the art systems: • BlueGene/Q – Fastest UK machine and world’s largest software development platform • Over 5 PB disc and 15 PB tape stores • iDataplex cluster • Data Intensive systems • Visualisation System
~30,000 user visitors each year in Europe: • physics, chemistry, biology, medicine, • energy, environmental, materials, culture • pharmaceuticals, petrochemicals, microelectronics Visit facility on research campus Place sample in beam Diffraction pattern from sample Fitting experimental data to model The science we do - Structure of materials • Billions of € of investment • c. £400M for DLS • + running costs • Over 5.000 high impact publications per year in Europe • But so far no integrated data repositories • Lacking sustainability & traceability Magnetic moments in electronic storage Hydrogen storage for zero emission vehicles Bioactive glass for bone growth Longitudinal strain in aircraft wing Structure of cholesterol in crude oil
Now … Data Synchronisation Network monitoring Data monitoring Data Cataloguing Data archive
ICAT Tool Suite and Clients TopCAT (Web Interface to ICATs) Desktop app ICAT Data Explorer (Eclipse Plugin in DAWN) ICAT Job Portal Clusters/HPC Disk ICAT + Mantid (desktop client) IDS (ICAT Data Service) Tape ICAT APIs http://www.mantidproject.org/ http://www.dawnsci.org/ https://code.google.com/p/icat-job-portal/
Facility Data Lifecycle Metadata Catalogue Publication Proposal Approval Data analysis Scheduling Data reduction Experiment Traditionally, these steps are decoupled from facilities. However, they are key to derive useful insights. http://www.icatproject.org
Managing Data Processing Pipelines • Issues: • Valuable data amongst noise • Software version • Data provenance • Distributed analysis • Complex and dynamic workflows • Usability of tools Raw data Derived data Resultant data Credits: Martin Dove, Erica Yang (Nov. 2009) Credit: Phil Withers, Andy Alderson, Sam McDonald
Detector Rates • Dectris Pilatus 6M • 2463 x 2527 pixels • 7 MB Images • 25 frames per sec. • 175 MB/s • High Duty Cycles means that 10 TB / day is quite possible "The Pilatus detector has completely transformed the way X-ray photons are being detected today at synchrotron radiation sources, such as Diamond. This is something we could only have dreamt of in the early days of synchrotron sciences.“ Prof.Gerhard MaterlikCBE, CEO of Diamond Light Source, June 18th, 2012
Infrastructure for managing data flows Segment + Quantify 3D mesh + Image based Modelling Predict + Compare Reconstruct Scan Data Catalogue Petabyte Data storage Parallel File system HPC CPU+GPU Visualisation Infrastructure + Software + Expertise! • Tomography: Dealing with high data volumes – 200Gb/scan, ~5 TB/day (one experiment) • MX: high data volumes, smaller files, but a lot more experiments • Hard to move the data – needs to be handled at the facility? Some mage credit: Avizo, Visualization Sciences Group (VSG)
Photon and Neutron Data Infrastructure • Established in 2007 with 4 facilities • now standing at 13 • With “friends” around the world • Combined Number of Unique Users more than 35000 in 2011 • Combines Scientific and IT staff from the collaborating facilities • European Framework 7 Projects • PaNdata-Europe: SA, 2009-11 • PaNdata-Open Data Infrastructure, IP, 2011-14 • Guestimates • Investment > €4.000.000.000* • Running costs > €500.000.000/yr* • Publications > 10.000/yr* • RCosts/Publication ~ €50.000*% • Data volume >> 10PB/yr* PaNdata
Counting Users http://pan-data.eu/Users2012-Results
Shared Data Policy Framework PaN-Data Integration Federated User Authentication NeXus Common Data Format Federated Data Catalogue
Topic Publication Keyword Core Metadata Shared Model and Terms Authorisation Investigation Investigator Dataset Sample Sample Parameter Datafile Dataset Parameter Parameter Related Datafile Datafile Parameter
Provenance • Integrating context, analysis and publication into the record • Preservatiom • Long-term need for archiving and curating data • Persistence Identifiers, itegtrity, context, • Costs and Benefits of data preservation • Scalability • Managing high data rates and volumes • Parallel file stores Towards the Future
Credit: Brian Matthews DOI Data Access Process Paper DataCite STFC Page TopCAT
Record Publication Proposal Generate DOI Landing page from RO Raw Data :hasDataset Facilities Data Lifecycle Approval :investigator Construct Scheduling Investigation #n DOI:STFC.xxx Data storage Subsequent publication registered with facility Experiment Derived Data Data analysis :hasRelatedDataset Scientist submits application for beamtime :instrument :hasPublication Tools for processing made available Publications Raw data filtered, and stored Facility committee approves application Scientists visits, facility run’s experiment Publish :hasPublication Facility registers, trains, and schedules scientist’s visit Investigation as a Research Object
E-INFRASTRUCTURE Requirements • Data Rates Begin to Require Dedicated Central IT Infrastructure, • Way Beyond Previous Requirements. • Data sets become too numerous to keep track of • Data Management, common metadata • Data Sets Become Too Large to Take Home. • Data policies • Archive at facility and “cloud” data access • Analysis requires high level of computational power • Archive at facility and “cloud” access to HPC • integrating data analysis processes into data management processes • Integrating workflow into data large scale data management processes • Variety of Scientific areas leads to a variety of Data Formats and Analysis Software. • Common data formats and APIs • Large Number of Users moving between labs. • Federated authentication and data catalogues • The rise of data intensive experiments and computation • Real time data processing for live experiments • Streaming data processing
Integrating Data Neutron diffraction X-ray diffraction Developments that will influence how the data is managed • Facilities offer complementary experimental techniques for a single beamline • (e.g. tomography+diffraction) • Users increasingly use multiple facilities leading to the need for multi-stream data fusion and processing • Including remote access to HPC and data storage resource • Using provenance information effectively • Data Publication • Data tracing • Data publication in context • Reproducability High-quality structure refinement
RDA Interest Group • Proposing a New RDA Interest Group • Photon and Neutron Science (PaNSIG) • PaNData Partners • + US and Aus partners • Plan to hold first workshop in Dubkin, March 2013
Thank You Brian Matthews brian.matthews@stfc.ac.uk Prioriphora schroederhohenwarthi X-Ray Imaging at ESRF Solorzano et al, 2011, Systematic Entomology (2011)