210 likes | 411 Views
Data and Communications in Basic Energy Sciences Creating a Pathway for Scientific Discovery A Workshop Co-sponsored by Basic Energy Sciences and Advanced Scientific Computing Research. Bethesda North Marriott Hotel & Conference Center Bethesda, MD October 24-25, 2011.
E N D
Data and Communications in Basic Energy SciencesCreating a Pathway for Scientific DiscoveryA Workshop Co-sponsored by Basic Energy Sciences and Advanced Scientific Computing Research Bethesda North Marriott Hotel & Conference Center Bethesda, MD October 24-25, 2011 Mike Simonson, Co-Organizer Neutrons Sciences Directorate Oak Ridge National Laboratory
ASCR-BES Data Workshop Sponsors BES: Dr. Harriet Kung (Jim Davenport) ASCR: Dr. Dan Hitchcock (Walt Polansky, Ceren Susut-Bennett) Charge • Review status, successes, and shortcomings of current data and communication pathways for scientific discovery in the basic energy sciences; • Ascertain knowledge, methods and tools needed to mitigate present and projected data and communication shortcomings; • Consider opportunities and challenges related to data and communications with the combination of techniques in single experiments; • Identify research areas in data and communications needed to underpin advances in the basic energy sciences in the next ten years; • Create the foundation for information exchanges and collaborations among ASCR-BES research and facilities communities Co-Chairs • Peter Nugent, LBNL (NERSC) • J. Michael Simonson, ORNL (SNS)
BES Scientific User Facilities: Resources for Energy Research Electron Microscopy Center for Materials Research Center for Nanoscale Materials Center for Functional Nanomaterials Advanced Light Source Advanced Photon Source National Center for Electron Microscopy National Synchrotron Light Source Molecular Foundry National Synchrotron Light Source-II Stanford Synchrotron Radiation Lab Spallation Neutron Source Linac Coherent Light Source Center for Nanophase Materials Science Shared Research Equipment Program Los Alamos Neutron Science Center High-Flux Isotope Reactor Center for Integrated Nanotechnologies • 4 Synchrotron Radiation Light Sources • Linac Coherent Light Source • 3 Neutron Sources • 3 Electron Beam Microcharacterization Centers • 5 Nanoscale Science Research Centers
Broader Context Other Agencies: Data Driven Science, Storage, Analysis, Simulation NSF Task Force on Data and Visualization Other Programs in DOE: LHC, RHIC, Climate Research, Leadership Computing….. Interagency Working Group on Big Data Competes Act 2010: Working Group on Public Access Office of Science Working Group on Digital Data Data Play a Key Role in Materials Genome 4
Emerging Challenges in User Facilities Provided to workshop participants by BES • BES facilities have capability to produce TeraBytes of data per day from single beam lines • 20-50 beam lines per source • LCLS, SNS, Synchrotron Light Sources, e- Microscopes are excellent examples • Increasing use of time resolved & tomographic studies • Increased need for analysis ‘on the fly’ • Optimize use of valuable resources (beam time and researcher effort) • Broad spectrum of BES data needs requires: • New level of understanding as a result of sophisticated applied mathematics and computer science techniques • New science that extracts the most from our (i.e. BES) facilities 5
LCLS : Femtosecond Crystallography • Low resolution structure of Photosystem I determined from ~15,000 nanocrystals • Each crystal was illuminated sequentially and destroyed by the LCLS beam • Dose >30 times larger than classical damage limit • Unfortunately, low photon energy at AMO limited resolution and chance for new biology • Recent experiment~ 18 Terabytes collected in 8 hours! Synchroton LCLS 6
Data Processing: Imaging Requirements • Imaging reconstruction – 105 images/s • Orientation algorithm ~N6 flops • For new detectors with N ~ 103, real-time image analysis requires 1018 flops • Advances needed in compute power, theory and algorithms, and local processing 7
G(r) T r • High flux and efficient detectors enable advances in science using neutrons • Unprecedented capability for parametric studies • New understanding of structural changes (distortions, dislocations) with physical parameters (temperature, pressure) • High data rates and large data sets are generated • Data rates of ~1GByte/minute are sustained over runs of several hours (TByte data sets) • Estimate 4 TB/s from full operation at SNS Data from New Neutron Instruments 1 meter long detector banks (3He tubes) Temperature dependent structure of a mixed-valence manganite from neutron diffraction at SNS NOMAD
Expanding Data Collection Rate Before µs X-tomography • Time-resolved, two-dimensional imaging implies a large amount of raw data • Data rate: 25 MB/second, 100 GB/hour, 2.4 TeraBytes/day • A frame of high-resolution image: 1 GB (tiled) • 250 GB for a time sequence of 1 ms (250 intervals within 1 ms) • Time sequences as a function of injection pressure, fuel injected and many other parameters • Extensive imaging processing • With future improvements • APS-U and detector improvement) • We will have a factor of 10 to 100 faster data rate • An estimate is 50-100 TB/day • Manual data processing won’t be possible After Improved Distribution based on APS Results unpublished results, at full resolution: 100 MB) 9
ASCR-BES Data Workshop- Participants, Observers & Speakers - • Number of Participants- 80 • National Laboratories, Universities, NIST, NSF & International • Observers- ASCR, BES, BER, HEP & SC-2 • Plenary Speakers • Brent Fultz, CalTech- Workflow • Thomas Schulthess, ETH Swiss SC Center- Theory & Algorithms • Dave Pugmire, ORNL, Visualization and Analysis • QuinceyKoziol, HDF Group, Data Processing & Management • Luncheon Speaker • Adam Riess, 2011 Nobel Laureate in PhysicsProfessor, John Hopkins UniversityScientist, Space Telescope Science Institute
Break-out Sessions Identify immediate (1-3 year), near-term (3-5 year) and longer-term opportunities to link ASCR capabilities with BES needs • Workflow management: Experiment to Science Chairs: Mark Hagen, ORNL; Ross Miller, ORNL • Identifying and managing the data path from experiment to publication. • Theory and algorithms Chairs: Bobby Sumpter, ORNL; Chao Yang, LBNL • Recognizing the need for new tools for computation at scale, supporting large data sets and realistic theoretical models. • Visualization and Analysis Chairs: Dean Miller, ANL; James Belak, LLNL • Supporting near-real-time feedback for experiment optimization and new ways to extract and communicate critical information from large data sets. • Data Processing and Management Chairs: Amber Boehnlein, SLAC NAL; Eli Dart, LBNL • Outlining needs in computational and communication approaches and infrastructure needed to handle unprecedented data volume and information content.
Poster Session “High Performance Computing in Accelerator Science” “SciDAC’s Scientific Data Management Center” “ESnet: A Partner in Data Intensive Science” “Challenges and Opportunities in Data Systems at the Spallation Neutron Source” “Data Challenges at Current and Future Light Sources” “Linac Coherent Light Source” “Data Needs from BES Nanoscale Research Centers: Examples from the Center for Nanophase Materials Sciences”
Enabling Transformative Science Detector and source advancements will enable transformative science within BES facilities. Current systems are producing a tremendous amount of data. Future systems will overwhelm current analysis pipelines. • Integrate theory and analysis components seamlessly within experimental workflow. • Move analysis to closer to experiment. • Match data management access and capabilities with advancements in detectors & sources.
… working seamlessly within experimental workflow. Integrate theory and analysis… • Coupled simulation and experiment • Theory guidance for experimental design • Analysis feedback to steer experiment • Common data formats • Common community toolsets for analysis and workflow • Apply ASCR’s investment in visualization and analysis tools (invest in adaptation specific to experiments)
Move analysis closer to experiment • Real-time (in-situ), streaming analysis at beamline. • Local data reduction capabilities • Zero suppression • Hierarchical filtering • Baseline and background subtraction • Live visualization of experiment • Improve the efficiency of the experiment • Increase data-quality • Improve off-line analysis
Match data management access and capabilities… … optimizing advancements in detectors & sources. • Remove the bottlenecks • Apply existing data transport and mobility toolsets • Apply forefront mathematical techniques to more efficiently extract science from the experiment. • Interoperability across different facilities/beamlines • Combine multiple data sets • Expandable • Incorporate legacy data • Integrated teams of engineers, scientists and computer scientists to solve these problems
Recommendation Summary Focus on high-value-added areas • Integrate theory and analysis components seamlessly within experimental workflow. • Move analysis to closer to experiment. • Match data management access and capabilities with advancements in detectors & sources. • Continue dialog and collaboration between BES and ASCR on areas of interest and benefit • Maximize scientific impact from existing and enhanced user facilities
Start Data Acquisition Live DataView Decide whatto do next Reduce/View Data If somethingis wrong Changeconfiguration Workflow, Processing & Visualization Propose new experiment DATA STORAGE ANALYSIS Preliminary Findings
Collaboration: Theory & Algorithms • Inverse problems and solution algorithms (near term 1-3 years), but there is a need for long term R&D • Feature extraction, image analysis (near term 1-3 years), include model based constraint (longer term) • Combine multiple data sources and imaging techniques to provide more reliable solutions (mid-term 5 years) • Ab initio theory-guided experiments for data triage/reduction (long term 5-10 years) • Computational endstation that couples virtual and real experiments (long term 5-10 years)
ADARA Project at ORNL (SNS & OLCF) • ADARA – Accelerating Data Acquisition, Reduction and Analysis • Collaboration between Neutron Data Analysis and Visualization Group (NScD) and Technology Integration Group (OLCF) with support from Data Acquisition in RAD • Basic steps of ADARA project: • Stream data to analysis nodes • Modify data process code tostreaming data reduction • Streaming translation service creates • data files • Expand existing data reduction/analysis • resources (parallel file system, • clusters) to process large SNS data sets alongside ASCR/OLCF resources • Install network infrastructure between SNS and 5600 • Install user workstations with fast network connections (leveraging ASCR infrastructure) • Reduction/analysis resources (5600) installed in ~3 months time. Prototype SMS, STS, SRS by ~August 2012 Deployment on beam lines following • ADARA will give SNS users near “instant” access to data and will lay foundations for future live data analysis and computational steering.