1 / 30

e-Infrastructure across Photon and Neutron Sources

e-Infrastructure across Photon and Neutron Sources. Brian Matthews brian.matthews@stfc.ac.uk Scientific Computing Department STFC Rutherford Appleton Laboratory. STFC. Formed by Royal Charter in 2007, the Science and Technology Facilities Council

thu
Download Presentation

e-Infrastructure across Photon and Neutron Sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. e-Infrastructure across Photon and Neutron Sources Brian Matthews brian.matthews@stfc.ac.uk Scientific Computing Department STFC Rutherford Appleton Laboratory

  2. STFC

  3. Formed by Royal Charter in 2007, the Science and Technology Facilities Council is one of Europe's largest multidisciplinary research organisations supporting scientists and engineers world-wide.  The Council operates world-class, large scale research facilities and provides strategic advice to the UK government on their development. The Council fundsuniversity research in particle physics, nuclear physics, astronomy and space.

  4. UK ATC James Clerk Maxwell Telescope Hawaii RAL: Diamond Daresbury Lab CERN: LHC Rutherford Appleton Lab ILL and ESRF ESA: Top Sat Chilbolton Lab ESO: Alma Array

  5. The data centre at RAL • RAL Base of Scientific Computing The scientific data centre at RAL • UK LHC Tier 1 - ~20Pb data • EMERALD - Largest production GPU facility in UK: 372 Nvidia Telsa M2090 GPUs • Jasmin/CEMS – 4Pb of climate modelling data • SCARF – >2200 core cluster for simulation

  6. Facilities and Resources of The Hartree Centre • Scientific Computing Department at Daresbury Laboratory • Projects and codes developed on state of the art systems: • BlueGene/Q – Fastest UK machine and world’s largest software development platform • Over 5 PB disc and 15 PB tape stores • iDataplex cluster • Data Intensive systems • Visualisation System

  7. Data Infrastructure for Large-Scale Facilities

  8. STFC Rutherford Appleton Laboratory

  9. ~30,000 user visitors each year in Europe: • physics, chemistry, biology, medicine, • energy, environmental, materials, culture • pharmaceuticals, petrochemicals, microelectronics Visit facility on research campus Place sample in beam Diffraction pattern from sample Fitting experimental data to model The science we do - Structure of materials • Billions of € of investment • c. £400M for DLS • + running costs • Over 5.000 high impact publications per year in Europe • But so far no integrated data repositories • Lacking sustainability & traceability Magnetic moments in electronic storage Hydrogen storage for zero emission vehicles Bioactive glass for bone growth Longitudinal strain in aircraft wing Structure of cholesterol in crude oil

  10. Now … Data Synchronisation Network monitoring Data monitoring Data Cataloguing Data archive

  11. ICAT Tool Suite and Clients TopCAT (Web Interface to ICATs) Desktop app ICAT Data Explorer (Eclipse Plugin in DAWN) ICAT Job Portal Clusters/HPC Disk ICAT + Mantid (desktop client) IDS (ICAT Data Service) Tape ICAT APIs http://www.mantidproject.org/ http://www.dawnsci.org/ https://code.google.com/p/icat-job-portal/

  12. Scaling

  13. Facility Data Lifecycle Metadata Catalogue Publication Proposal Approval Data analysis Scheduling Data reduction Experiment Traditionally, these steps are decoupled from facilities. However, they are key to derive useful insights. http://www.icatproject.org

  14. Managing Data Processing Pipelines • Issues: • Valuable data amongst noise • Software version • Data provenance • Distributed analysis • Complex and dynamic workflows • Usability of tools Raw data Derived data Resultant data Credits: Martin Dove, Erica Yang (Nov. 2009) Credit: Phil Withers, Andy Alderson, Sam McDonald

  15. Detector Rates • Dectris Pilatus 6M • 2463 x 2527 pixels • 7 MB Images • 25 frames per sec. • 175 MB/s • High Duty Cycles means that 10 TB / day is quite possible "The Pilatus detector has completely transformed the way X-ray photons are being detected today at synchrotron radiation sources, such as Diamond. This is something we could only have dreamt of in the early days of synchrotron sciences.“ Prof.Gerhard MaterlikCBE, CEO of Diamond Light Source, June 18th, 2012

  16. Infrastructure for managing data flows Segment + Quantify 3D mesh + Image based Modelling Predict + Compare Reconstruct Scan Data Catalogue Petabyte Data storage Parallel File system HPC CPU+GPU Visualisation Infrastructure + Software + Expertise! • Tomography: Dealing with high data volumes – 200Gb/scan, ~5 TB/day (one experiment) • MX: high data volumes, smaller files, but a lot more experiments • Hard to move the data – needs to be handled at the facility? Some mage credit: Avizo, Visualization Sciences Group (VSG)

  17. Sharing

  18. Photon and Neutron Data Infrastructure • Established in 2007 with 4 facilities • now standing at 13 • With “friends” around the world • Combined Number of Unique Users more than 35000 in 2011 • Combines Scientific and IT staff from the collaborating facilities • European Framework 7 Projects • PaNdata-Europe: SA, 2009-11 • PaNdata-Open Data Infrastructure, IP, 2011-14 • Guestimates • Investment > €4.000.000.000* • Running costs > €500.000.000/yr* • Publications > 10.000/yr* • RCosts/Publication ~ €50.000*% • Data volume >> 10PB/yr* PaNdata

  19. Counting Users http://pan-data.eu/Users2012-Results

  20. Shared Data Policy Framework PaN-Data Integration Federated User Authentication NeXus Common Data Format Federated Data Catalogue

  21. Topic Publication Keyword Core Metadata Shared Model and Terms Authorisation Investigation Investigator Dataset Sample Sample Parameter Datafile Dataset Parameter Parameter Related Datafile Datafile Parameter

  22. Provenance • Integrating context, analysis and publication into the record • Preservatiom • Long-term need for archiving and curating data • Persistence Identifiers, itegtrity, context, • Costs and Benefits of data preservation • Scalability • Managing high data rates and volumes • Parallel file stores Towards the Future

  23. Publishing

  24. Credit: Brian Matthews DOI Data Access Process Paper DataCite STFC Page TopCAT

  25. Record Publication Proposal Generate DOI Landing page from RO Raw Data :hasDataset Facilities Data Lifecycle Approval :investigator Construct Scheduling Investigation #n DOI:STFC.xxx Data storage Subsequent publication registered with facility Experiment Derived Data Data analysis :hasRelatedDataset Scientist submits application for beamtime :instrument :hasPublication Tools for processing made available Publications Raw data filtered, and stored Facility committee approves application Scientists visits, facility run’s experiment Publish :hasPublication Facility registers, trains, and schedules scientist’s visit Investigation as a Research Object

  26. Futures

  27. E-INFRASTRUCTURE Requirements • Data Rates Begin to Require Dedicated Central IT Infrastructure, • Way Beyond Previous Requirements. • Data sets become too numerous to keep track of • Data Management, common metadata • Data Sets Become Too Large to Take Home. • Data policies • Archive at facility and “cloud” data access • Analysis requires high level of computational power • Archive at facility and “cloud” access to HPC • integrating data analysis processes into data management processes • Integrating workflow into data large scale data management processes • Variety of Scientific areas leads to a variety of Data Formats and Analysis Software. • Common data formats and APIs • Large Number of Users moving between labs. • Federated authentication and data catalogues • The rise of data intensive experiments and computation • Real time data processing for live experiments • Streaming data processing

  28. Integrating Data Neutron diffraction X-ray diffraction Developments that will influence how the data is managed • Facilities offer complementary experimental techniques for a single beamline • (e.g. tomography+diffraction) • Users increasingly use multiple facilities leading to the need for multi-stream data fusion and processing • Including remote access to HPC and data storage resource • Using provenance information effectively • Data Publication • Data tracing • Data publication in context • Reproducability High-quality structure refinement

  29. RDA Interest Group • Proposing a New RDA Interest Group • Photon and Neutron Science (PaNSIG) • PaNData Partners • + US and Aus partners • Plan to hold first workshop in Dubkin, March 2013

  30. Thank You Brian Matthews brian.matthews@stfc.ac.uk Prioriphora schroederhohenwarthi X-Ray Imaging at ESRF Solorzano et al, 2011, Systematic Entomology (2011)

More Related