150 likes | 252 Views
The Data Management Requirements at SNS. Shelly Ren & Steve Miller Scientific Computing Group, SNS-ORNL. December 11, 2006. SNS Neutron Scattering User Facility. Neutron Scattering Science Areas. Chemistry – microstructures Complex Fluids – fluid properties
E N D
The Data Management Requirements at SNS Shelly Ren & Steve Miller Scientific Computing Group, SNS-ORNL December 11, 2006
SNS Neutron Scattering User Facility SDM 12/11/2006
Neutron Scattering Science Areas • Chemistry – microstructures • Complex Fluids – fluid properties • Crystalline Materials – molecular structure • Disordered Materials – structure characterization • Engineering – study material stress/strain • Magnetism & Superconductivity – material properties • Polymers – studying “giant” molecules • Structural Biology - proteins SDM 12/11/2006
SNS Instrument Commissioning Schedule SDM 12/11/2006
1,200 raw+reduced 1,400 old raw 1,000 reduced raw 1,200 raw+reduced old raw 800 raw 1,000 GB/day TB 600 800 600 400 400 200 200 0 2006 2007 2008 2009 2010 2011 0 2006 2007 2008 2009 2010 2011 YEAR YEAR SNS Potential Data Volume Production Data Rate Just Instrument Data Here Total Stored Data SDM 12/11/2006
Integrating Computation with Experimentation Key Web Browser Metadata Hardware Access and authorization control Data Portal Control portal Data portal Analysis portal Software HPC Support interactive feedback Decision Support & Intelligent Control acquisition High Performance Computing Automation simulation analysis Diagnostics Controls Instrument Simulation Materials simulation Treatment Analysis Acquisition Instrument visualization Vis Vis Vis Vis Sample & Environment Vis Electronic notebook Database Instrument simulation Materials simulation Raw Intermediate Scientific Database Scientific Intermediate Raw Notebook Sample & environment Simulation Simulation data Publications Repository Proposal Documentation SDM 12/11/2006
Creating, Processing and Storing Data • Event Histogramming • Detector to Pixel mapping • Instrument Geometry • Metadata extraction • Create NeXus file • Catalog and Store • Reduce Data • All subsystems functional to some degree SDM 12/11/2006
Current SNS Data Hierarchy SNS data are stored on NFS mounted file system • Direct Attached Storage (DAS) - incrementally growing the storage resources based upon need • A data server for DAS - Terabytes internal hard drive storage SNS metadata are stored in Oracle database Data Hierarchy • /facility/instrument/proposalID/experimentId/runNumber • /Nexus/NeXus files • /preNeXus/metadata files • /analysis • e.g. /SNS/BSS/2006_1_2_SCI/1/100/NeXus/BSS_100.nxs • /SNS/BSS/2006_1_2_SCI/1/100/preNeXus/cvinfo.xml • /SNS/BSS/2006_1_2_SCI/1/100/preNeXus/cvbeam.xml icat-search user-workspace ICAT metadata -- Oracle DB ICAT Appl Server -- JBoss live-catalog sns-checkin data browser SDM 12/11/2006
SNS Data Access – Through Unix Shell • Symbolic links are created in the user’s home directory to link to the proposal directories he/she is a member of • Symbolic links are created for the user in the users’ home directory to link to the public directory where public data reside • Disk quota may be allocated for users to perform analysis, simulation User Workspace /facility/users/neutron_boy/workspace (write) /proposalID (read only) /proposalID (read only) /public (read only) /facility/users/public/proposalID /proposalID Gray names are symbolic links to data hierarchy SDM 12/11/2006
SNS Data Access – Through Portal First SNS Data ISAW Plot NeXus tags NeXu Files metadata SDM 12/11/2006
Enter Text Search String Select Optional Search Fields Returns Files Found Search Your Data via the Web • Enter search text • Select search fields • Select files of interest to browse or to download SDM 12/11/2006
Monte Carlo Simulation via the Portal SDM 12/11/2006
SNS Data Management Requirements • Archive, catalog and maintain data produced by SNS instruments so users can access them from anywhere at anytime and not worry about data storage issues • Grant authorized access to SNS data and metadata for both shell and portal users (ensure data is private to the experiment team) • Provide services for efficient search, browse, download SNS data and metadata • Allow users to share datasets with their collaborators or access datasets that have been made public, in a scalable fashion • Provide data management service to HFIR, LUJAN, IPNS and other interested neutron facilities. • Extend dataset storage to spin disc, HPSS and other archival systems • Manage distributed dataset storage and perform data transport for the end users • Federate data storage with partner neutron facilities like ISIS so that the users would see all their experiment data by logging into one facility. SDM 12/11/2006
SNS Long Term Data Management Needs • Create a single file hierarchy for accessing data distributed across multiple storage systems and multiple facilities even extending beyond neutron scattering facilities • Support the management, collaboration, controlled sharing, replication, transfer, and preservation of distributed data • Capture metadata for user produced data • Automate data transfer • Improve data processing -- parallel and scalable • Search large volumes of data for patterns to find certain structures within their data -- data mining • Establish a unified user authentication service across neutron facilities • Provide users with ease of use portal service to search, browse, download and upload data; to search, annotate, and update metadata; • Integrate experiment with simulation, launch simulation jobs that need programmatic access to the distributed data resources. SDM 12/11/2006
Summary • As more instruments are going through instrument commissioning phase and diving into new science discovery era, we are facing the emerging challenge of managing the scientific data that can grow to petabytes scale in a few years • As a user facility, SNS will have a steady stream of users to run experiment, generate raw and analysis data files – we will need not only disc cache but also long term storage system like HPSS • Promise to search and retrieve SNS data and metadata for end users anywhere anytime in a timely fashion • Grow our data management resources and collaborate with the community • Looking for opportunities to work with and leverage resources beyond our facility • Eager to reach out, learn and collaborate with data management experts working on the data management discipline in all domain areas • Wish to understand and utilize new software applications to manage distributed data storage; to transport, search and retrieve data more effectively and efficiently SDM 12/11/2006