320 likes | 474 Views
SDM Integration Framework in the Hurricane of Data. SDM AHM 10/07/2008 Scott A. Klasky klasky@ornl.gov. ANL: Ross CalTech : Cummings GT: Abbasi, Lofstead , Schwan, Wolf, Zheng LBNL: Shoshani , Sim , Wu LLNL: Kamath ORNL: Barreto , Hodson, Jin, Kora, Podhorszki
E N D
SDM Integration Framework in the Hurricane of Data SDM AHM 10/07/2008 Scott A. Klasky klasky@ornl.gov ANL: Ross CalTech: CummingsGT: Abbasi, Lofstead, Schwan, Wolf, Zheng LBNL: Shoshani, Sim, Wu LLNL: Kamath ORNL: Barreto, Hodson, Jin, Kora, Podhorszki NCSU: Breimyer, Mouallem, Nagappan, Samatova, Vouk NWU: Choudhary, Liao NYU: Chang, Ku PNNL: Critchlow PPPL: Ethier, Fu, Samtaney, Stotler Rutgers: Bennett, Docan, Parashar, Silver SUN: Di Utah: Kahn, Parker, Silva UCI: Lin, Xiao UCD: Ludaescher UCSD: Altintas, Crawl
Outline • Current success stories and future customers of SEIF. • The problem statement. • Vision statement. • SEIF. • ADIOS. • Workflows. • Provenance. • Security. • Dashboard. • The movie. • Vision for the future.
Current success stories • Data Management/Workflow • Finally, our project benefits greatly from the expertise of the computational science team at NCCS in extracting physics out of a large amount of simulation data, in particular, through collaborations with Dr. Klasky on visualization, data management, and work flow… HPCWire 2/2007 • Success in using workflows. Used several times with GTC and GTS groups. (Working on everyday uses). • ADIOS • July 14 -- A team of researchers from the University of California-Irvine (UCI), in conjunction with staff at Oak Ridge National Laboratory's National Center for Computational Sciences (NCCS), has just completed what it says is the largest run in fusion simulation history. “This huge amount of data needs fast and smooth file writing and reading," said Xiao. "With poor I/O, the file writing takes up precious computer time and the parallel file system on machines such as Jaguar can choke. With ADIOS, the I/O was vastly improved, consuming less than 3 percent of run time and allowing the researchers to write tens of terabytes of data smoothly without file system failure.“ (HPC Wire 7/2008) • “Chimera code ran 1000x faster I/O with ADIOS on Jaguar” Messer SciDAC08. • S3D will include ADIOS. J. Chen. • ESMF team looking into ADIOS. C. DeLuca • R. Harrison (ORNL) looking into ADIOS. • XGC1 code using ADIOS everyday. • GTS code now working with ADIOS.
Current success stories • Workflow automation. • S3D data archiving workflow moved 10TB of data from NERSC to ORNL. J. Chen • GTC workflow automation saved valuable time during early simulations on jaguar, and simulations on seaborg. • “From a data management perspective, the CPES project is using state-of-the-art technology and driving development of the technology in new and interesting ways. The workflows developed under this project are extremely complex and are pushing the Kepler infrastructure into the HPC arena. The work on ADIOS is a novel and exciting approach to handling high-volume I/O and could be extremely useful for a variety of scientific applications. The current dashboard technology is using existing technology, and not pushing the state of the art in web-based interfaces. Nonetheless, it is providing the CPES scientists an important capability in a useful and intuitive manner. “ CPES reviewer 4 (CPES SciDAC review) . • There are many approaches to creating user environments for suites of computational codes, data and visualization. This project made choices that worked out well. The effort to construct and deploy the dashboard and workflow framework has been exemplary. It seems clear that the dashboard has already become the tool of choice for interacting with and managing multi-phase edge simulations. The integration of visualization and diagnostics into the dashboard meets many needs of the physicists who use the codes. The effort to capture all the dimensions of provenance is also well conceived, however the capabilities have not reached critical mass yet. Much remains to be done to enable scientists to get the most from the provenance database. This project is making effective use of the HPC resources at ORNL and NERSC. The Dashboard provides run time diagnostics and visualization of intermediate results which allows physicists to determine whether the computation is proceeding as expected. This capability has the potential to save considerable computer time by identifying runs that are going awry so they can be canceled before they would otherwise have finished. (CPES SciDAC review, reviewer #1).
Current success stories • The connection to ewok.ccs.ornl.gov is unstable. • I got error an error message like “ssh_exchange_identification: Connection closed by remote host” when I tried ‘ssh ewok’ from jaguar. This error doesn’t happen always (about 30% chance), but running kepler workflow is interrupted by this error. • Please check the system. • Thank you, • Seung-Hoe 10/3/2008 • Hi, • I have a few things to ask you to consider. • Currently workflow copies all *.bp files including restart.*.bp. And It seems that these restart*.bp files are converted into hdf5 and sent to AVS (not sure. Tell me if I am wrong) . This takes really long time for large runs. Could you make workflow not do all of these or do it after movie of other variables are generated? • Currently, dashboard shows only movies after the simulation ends. However, generating movies takes certain amount of time after the simulation ends. If restart files are exist, the time is very long. If I uses –t option for a already completed simulation, the time is about a few hours to generates all AVS graphs. So, if you can make a button which makes dashboard think the simulation is not ended, it would be useful. • The min/max routine doesn’t seem to catch real min/max for very tiny numbers. Some of data has value of ~1E-15 and global plot is just zero. One example is ‘ion__total_E_flux(avg)’ of shot j38. • When I tried workflow of restarted xgc run with the same shot number and different jobID, txt files are overwritten. • When I run workflow for franklin job, txt_file is ignored. Maybe I set something wrong. Could you check txt_file copy of franklin working OK? • Maybe some of the features are not problems with ADIOS version of workflow. If so, let me know please. • Thanks, • Seung-Hoe
Current and Future customers • GTC. • Monitoring workflows. • Will want post processing/data analysis workflows. • GTS. • Similar to GTC, but will need to get data from experimental source (MDS+). • XGC1 • Monitoring workflows. • XGC0-M3D code coupling. • Code coupling. • XGC1 – GEM code coupling. • GEM • M3D-K / GKM • S3D • Chimera. • Climate (with ESMF team). Still working out details.
The problem • DOE and NSF open science has open up the flood gates on leadership class computing. • ANL: 550 TF (this year). • ORNL: 1.x PF (by end of year). • NERSC: >200TF (this year). • It’s not about these numbers, it’s about CPU hours/year. • In 5 years we have gone up 100x in our computing power/year. • INCITE allows us to run >50M hours/year. • Data on 270TF for 1 day simulation >60 TB. • How do we get science (manage) all of this data?
Vision • Problem: Managing the data from a petascale simulation, and debugging the simulation, and extracting the science involves. • Tracking • the codes: Simulation, Analysis • the input files/parameters • the output files, from the simulation and analysis programs. • the machines and environment the codes ran on. • Gluing all of the pieces together with workflow automation to automate the mundane tasks. • Monitoring simulation in real-time. • Analyzing the results, and visualizing the results without requiring users to know all of the file names, and making this happen in the same place where we can monitor the codes. • Fast I/O which can be easily tracked. • Moving data to remote locations withoutbabysitting data movement.
Vision • Requirements. • Want all enabling technologies to play well together. • Componentized approach for building pieces in SEIF. • Components should work well by themselves, and work inside of framework. • Fast • Scalable • Easy to use!!!! • Simplify the life of application scientists!!!
General Architecture Analytics Computations Control Panel (Dashboard) & Display Networking Local/Remote … “Cloud” Orchestration (Kepler) Data, DataBasesProvenance…Storage Adaptable IO
Dashboard Visualization Wide-area Data Movement Workflow Code Coupling Provenance and Metadata Adaptable I/O SEIF (SDM End-to-end Integration Framework) • Workflow engine • Kepler • Provenance support • Code Provenance • Data Provenance • System Provenance • Workflow Provenance • Wide-area data movement • SRM • SRM-lite • Code coupling • Provenance tracking. • Visualization • Insitu with ADIOS • Monitoring with VISIT, Express • Analysis with VisTrails • Advanced analysis • Parallel R • Vistrails • Adaptable I/O • Fast I/O • Dashboard • Collaboration. Foundation Technologies Enabling Technologies Approach: place highly annotated, fast, easy-to-use I/O methods in the code, which can be monitored and controlled, have a workflow engine record all of the information, visualize this on a dashboard, move desired data to user’s site, and have everything reported to a database.
ADIOS Overview Scientific Codes • Allows plug-ins for different I/O implementations. • Abstracts the API from the method used for I/O. • Simple API, almost as easy as F90 write statement. • Best practices/optimize IO routines for all supported transports “for free” • Componentization. • Thin API • XML file • data groupings with annotation • IO method selection • buffer sizes • Common tools • Buffering • Scheduling • Pluggable IO routines External Metadata (XML file) ADIOS API buffering schedule feedback pHDF-5 MPI-IO MPI-CIO pnetCDF POSIX IO Viz Engines LIVE/DataTap Others (plug-in)
ADIOS Overview • ADIOS is an IO componentization, which allows us to • Abstract the API from the IO implementation. • Switch from synchronous to asynchronous IO at runtime. • Change from real-time visualization to fast IO at runtime. • Combines. • Fast I/O routines. • Easy to use. • Scalable architecture(100s cores) millions of procs. • QoS. • Metadata rich output. • Visualization applied during simulations. • Analysis, compression techniques applied during simulations. • Provenance tracking.
Initial ADIOS performance. • June 7, 2008: 24 hour GTC run on Jaguar at ORNL • 93% of machine (28,672 cores) • MPI-OpenMP mixed model on quad-core nodes (7168 MPI procs) • three interruptions total (simple node failure) with 2 10+ hour runs • Wrote 65 TB of data at >20 GB/sec (25 TB for post analysis) • IO overhead ~3% of wall clock time. • Mixed IO methods of synchronous MPI-IO and POSIX IO configured in the XML file • DART: <2% overhead forwriting 2 TB/hour withXGC code. • DataTap vs. Posix • 1 file per process (Posix). • 5 secs for GTCcomputation. • ~25 seconds for Posix IO • ~4 seconds with DataTap
Chimera IO Performance 2x scaling • Plot minimum value from 5 runs with 9 restarts/run • Error bars show maximum time for the method.
ADIOS challenges • Faster reading. • Faster writing on new petascale/exascale machines. • Work with file system experts to refine our file format for ‘optimal’ performance. • More characteristics in the file by working with analysis and application experts. • Index files better using FASTBIT. • In situ visualization methods.
Controlling metadata in simulations. • Problem: Codes are producing large amounts of • Files • Data in files. • Information in data. • Need to keep track of this data for extracting the science from the simulations. • Workflows need to keep track of file locations, what information is in the files, etc. • Makes it easier to develop generic(template) workflows if we can ‘gain’ this information inside of Kepler. • Solution: • Our solution is to provide a link from ADIOS into the provenance (Kepler,….).
Current Workflows. • Pre-production. (Uston) • Starting to work on this. • Production workflows. (Norbert) • Currently our most active area of use for SEIF. • Analysis Workflows. (Ayla, Norbert) • Current area of interest. • Need to work with 3D graphics + parallel data analysis.
Monitoring a simulation + archiving (XGC) • NetCDF files • Transfer files to e2e system on-the-fly • Generate plots using grace library • Archive NetCDF files at the end of simulation • Binary files (BP, ADIOS output) • Transfer to e2e system using bbcp • Convert to HDF5 format • Start up AVS/Express service • Generate images with AVS/Express • Archive HDF5 files in large chunks to HPSS • Generate movies from the images
Coupling Fusion codes for Full ELM, multi-cycles • Run XGC until unstable conditions • M3D coupling data from XGC • Transfer to end-to-end system • Execute M3D: compute newequilibrium • Transfer back the new equilibrium to XGC • Execute ELITE: compute growth rate andtest linear stability • Execute M3D-MPP: to study unstable states (ELM crash) • Restart XGC with new equilibriumfrom M3D-MPP
C codes Transfer files between hosts (bbcp, scp) BP to HDF5 conversion tool NetCDF split and merge tools NetCDF 1D variable to 1D plot (xmgrace) Bash shell scripts Variables in NetCDF to plots Implement single complete steps in coupling M3D-OMP run, ELITE run + stability decision + images, M3D-MPP preparation and run Python scripts Variables in HDF5 to images Info generation for the dashboard AVS/Express HDF5 2D variable to 2D image Other viz tools gnuplot IDL External tools orchestrated by the workflow
Dashboard site Remote (user’s) site SRM- lite SSH Request SSH Server Local Commands srmlite.xml GridFTP/FTP/ SCP transfers Disk Cache Disk Cache Provenance + Data Movement • Data Movement • Given a OTP firewall at one site, where local files resides • Need a client program that “pushes” file to users, which … • Automates movement of multiple files • Concurrent transfers: utilize B/W • Using various transfer protocols • Support entire directory transfers • Recover from mid-transfer interruptions • Can be invoked from Dashboard • Show what files are to be transferred • Provide asynchronous service – user can logout of Dashboard • Have a way to shows transfer progress asynchronously • Support monitoring from anywhere • Process provenance • the steps performed in the workflow, the progress through the workflow control flow, etc. • Data provenance • history and lineage of each data item associated with the actual simulation (inputs, outputs, intermediate states, etc.) • Workflow provenance • history of the workflow evolution and structure • System provenance • Machine and environment information • compilation history of the codes • information about the libraries • source code • run-time environment settings
Security Problem with OTP • Currently some leadership class machines would not let certificates. • Work with ORNL to support this functionality. • Enable long running workflows. • Enable job submission through dashboard. • Enable hpss retrieval for data analysis workflows via the dashboard.
Security Ewok Login node Compute node Ewok-web Kepler workflow PBS job GSISSH Dashboard (apache user) Jaguar Login node Compute node GSISSH MyProxy Server Simulation Grid cert PBS job NCCS proxy cert PASSCODE
GRAM will be added in second phase to support workflow systems other than Kepler and to allow Kepler to use the “standard” way for accessing resources in Grids Ewok Login node Compute node Ewok-web GSISSH Workflow PBS job Dashboard (apache user) GRAM Jaguar Login node Compute node GSISSH MyProxy Server Simulation Grid cert PBS job GRAM NCCS proxy cert PASSCODE
Workflow and other challenges • Make workflows easier to build and debug! • Finish the link from ADIOS to Kepler. • Create 1 workflow that works with S3D, GTC, GTS, XGC1 for code monitoring. • No changes in Kepler for these codes! • Analysis workflows integrated with advanced visualization and data analysis. • Queries of simulations. • We want to query data from multiple simulations.
Machine monitoring. • Allow for secure logins with OTP. • Allow for job submission. • Allow for killing jobs. • Search old jobs. • See collaborators jobs.
Dashboard challenges • Run simulations through the dashboard. • Allow interaction with data from HPSS. • Run advanced analysis and visualization on the dashboard. • Access to more plug-ins for data analysis. • 3D visualization. • More interactive 2D visualization. • Query multiple simulations/experimental data through the dashboard for comparative analysis. • Collaboration.
Vision for the future. • Tiger teams • Work on 1 code with several experts in all areas of DM. • Replace their IO with high performance IO • Integrate analysis routines in their analysis workflows. • Create monitoring and analysis workflows. • Track codes . • Integrate into SEIF. • 1 code at a time, 4 months per code. • Rest of team works on core technologies. workflow analysis IO provenance Dashboard Team leader
Long term approach for SDM • Grow the core technologies for eXascale computing. • Grow the core by working with more applications. • Don’t build infrastructure if we can’t see ‘core applications’ benefitting from this after 1.5 years of development. (1 at a time). • Team work! • Create our team of R&D which work with codes. • Build mature tools. • Better software testing before we release our software. • Need framework to live with just a few support people. • Need to componetize everything together. • Allow separate pieces to live without SEIF. • Make sure software scales to yottabytes! • But first make it work on MB’s. • Need to look into better searching of data across multiple simulations.