210 likes | 376 Views
Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007. NERSC Analytics Team Members. C. Aragon. Wes Bethel, Team Lead Cecilia Aragon Janet Jacobsen Peter Nugent Kurt Stockinger Gunther Weber
E N D
Overview of NERSC Analytics Program Cecilia Aragon NERSC Analytics Team aragon@hpcrd.lbl.gov NERSC User Group Meeting September 17, 2007
NERSC Analytics Team Members C. Aragon • Wes Bethel, Team Lead • Cecilia Aragon • Janet Jacobsen • Peter Nugent • Kurt Stockinger • Gunther Weber (~3 FTEs with 1 FTE to be hired) W. Bethel J. Jacobsen G. Weber P. Nugent K. Stockinger NERSC User Group Meeting, September 17, 2007
What is the Analytics Program at NERSC? • At NERSC, the Analytics Program is the confluence of several key technologies: • Data management • Data storage/retrieval/sharing/movement, data indexing/querying, format conversion, sharing. • Data analysis, exploration and visualization • Feature detection/tracking. • Statistical analysis. • Subsetting, filtering, partitioning. • Comparison: models to models, models to data, etc. • Interactive data exploration. • Visualization: visual analysis. • Workflow management • Systematic approach to “data processing pipelines,” especially those that use multiple distributed resources and automate scientific data processing activities. NERSC User Group Meeting, September 17, 2007
Analytics Components Workflow Experiment Data Analysis Results Simulation Raw data files, metadata, location transparency, single project, community repositories. Filter, feature detection, search, subset, transform, visualize. Images, movies, data, decisions, knowledge. High Performance I/O Libraries, Data Models NERSC User Group Meeting, September 17, 2007
What we do for NERSC users Mission statement: Facilitate NERSC User knowledge discovery through use, adaptation, extension, creation, application and deployment of a diverse array of technologies spanning the domains of • data management • data analysis and exploration • visualization • workflow management NERSC User Group Meeting, September 17, 2007
What we do for NERSC users • Generally, no off-the-shelf, general purpose solutions for Analytics exist. • The Analytics Program adapts, extends, integrates and sometimes creates technologies to meet user needs. • Consulting and collaborative projects with users in: visualization, data management, data exploration, data analysis, workflows. • Substantive impact on science comes through in-depth work with stakeholder/users. • Contact us at consult@nersc.gov NERSC User Group Meeting, September 17, 2007
NERSC Analytics Web Site • http://www.nersc.gov/nusers/analytics/ • Completely redesigned in March 2007 • Response to 2006 User Survey (need for more web-based analytics documentation) NERSC User Group Meeting, September 17, 2007
Resources of the Analytics Program: Personnel, Analytics System • Team of six(~3 FTEs with 1 FTE to be hired)with experience spanning all aspects of analytics, high performance computing, and many science domains. • DaVinci: SGI Altix – 32 processors, 192GB RAM, 40TB attached FC storage • Architectural balance favors data intensive operations: large SMP memory, best I/O bandwidth on the floor at NERSC. • Procurement process underwayfor new analytics machine (response to user predictions of substantial increase in data size over next two years). NERSC User Group Meeting, September 17, 2007
Supported Science Areas NERSC User Group Meeting, September 17, 2007
SDM Workflow Analysis Visualization Accelerator X X X X Astrophysics X X X X Biology X X Chemistry X Climate X X X Combustion X X CS X X Fusion X X X Math X X X Analytics CustomerTechnology Matrix NERSC User Group Meeting, September 17, 2007
Analytics Customers • Samples of our work: • Climate • Fusion • Spectrum Synthesis • Laser Wakefield Particle Acceleration • Astrophysics • General purpose NERSC User Group Meeting, September 17, 2007
Analysis of Climate Modeling: Automatic Feature Extractionby Blind Source Separation Extracted features can be used as templates for finding similar features. Tropical storm visible in sea level pressure simulations at multiple time steps. In this case, the features were variations on rotating low-pressure systems. This was not assumed a priori. Images of extracted features: top ten independent components were extracted from set of all 8x8 subimages. NERSC User Group Meeting, September 17, 2007
Fusion: Comparative Analysis • Science objective: compare experiment (SSPX) and simulation (NIMROD). • Problem: data formats are incompatible with each other and with visual and comparative analysis tools. • Solution #1: one-step conversion from NIMROD binary output to VisIt format (replaces a procedure consisting of about 10 steps). • Solution #2: VisIt reader for SSPX data. Implement basic comparative visual analysis capabilities in VisIt. NERSC User Group Meeting, September 17, 2007
Spectrum Synthesis • NERSC Analytics contribution: visualization and analysis to test and confirm theory of new type of Type Ia SN – one having “Super- Chandrasekhar” mass. • Top: brightness vs. velocity and amount of deceleration. • Middle: velocity vs.mass and unburned carbon using a 1.4 solar mass model. • Bottom: moving to a 2 solar mass model includes new observation. NERSC User Group Meeting, September 17, 2007
PIC Simulation of Laser Wakefield Particle Acceleration This image uses volume rendering to show the plasma density field. (VisIt) This image shows a horizontal slice through the electric field; the electrons are colored by the magnitude of the momentum. (AVS/Express) NERSC User Group Meeting, September 17, 2007
Accretion-Induced Collapse of White Dwarfs • Data from 2D radiation-hydrodynamics simulations. • Data include 68 scalar and 25 vector fields. • Future simulations will be 3D. Mach number Entropy Electron fraction NERSC User Group Meeting, September 17, 2007
Analytics with Cooperative Funding – SNfactory • New supernova data analysis and workflow visualization tools (Sunfall and SNwarehouse) have improved usability and situational awareness, and enabled faster and easier access to data for supernova scientists worldwide • Advanced image processing (Fourier contour analysis) and machine learning techniques running on NERSC platforms have achieved a ~90% decrease in human workload in nightly supernova search (2.75 FTE) NERSC User Group Meeting, September 17, 2007
Scientific Data Management Storage Resource Manager (SRM) for distributed data management: • integrated mechanism for transferring files from one location to another, • uniform access to heterogeneous storage (disk, tape), • fault tolerant. FastBit for efficient indexing and querying. HDF5 FastQuery combines bitmap indices with HDF5. NERSC User Group Meeting, September 17, 2007
Improving Remote Display Performance • Remote Analytics – improve performance of remote display through X11 protocol acceleration/proxies. • General purpose solution widely applicable to many different applications • Addresses a major user concern of users • Conducted performance tests to evaluate various protocol acceleration technologies • Project scope and objectives documented on internal NERSC website -- next step is coordination with other groups within NERSC • Intent is to deploy technology to accelerate performance of applications with remote display capability NERSC User Group Meeting, September 17, 2007
NERSC Remote Licensing • Objective: • Allow remote users to take advantage of (expensive) commercially licensed software. • Implementation: • Consolidate all license serving inside NERSC to a central location. • Set up facility whereby remote users can “check out licenses” for use on their desktop machines. • Software supported: IDL, AVS, AVS/Express, CEI/Ensight Gold. NERSC User Group Meeting, September 17, 2007
Questions? http://www.nersc.gov/nusers/analytics/ Wes Bethel, NERSC Analytics Team Lead,ewbethel@lbl.gov Cecilia Aragon,aragon@hpcrd.lbl.gov Analytics Team, consult@nersc.gov NERSC User Group Meeting, September 17, 2007