260 likes | 369 Views
Computing Needs Assessment: Methodology and Practice. Karen Petraska, NASA Office of the CIO James McCabe, Computing Consultant August 2011. Outline. Methodology Introduction Objectives Proving and Scaling the Process Preliminary Work Practice
E N D
Computing Needs Assessment:Methodology and Practice Karen Petraska, NASA Office of the CIO James McCabe, Computing Consultant August 2011
Outline Methodology • Introduction • Objectives • Proving and Scaling the Process • Preliminary Work Practice • Interviewed Scientists from the NASA ESM Community • Whiteboard Sessions • Use-Case Scenarios • Describing Workflows Practice (Cont) • Process is Providing Valuable Insight • Found Varying Degrees of Complexity in ES Models • Generalizing Earth Science Modeling and Analysis • There are a Number of Interesting Topics that are Emerging • Future Work • Appendices
Introduction Computing needs assessment is similar to requirements analysis process, and consists of: • Targeting NASA Mission Communities • Conducting in-depth interviews and whiteboard sessions with scientists and engineers • Learning how missions get done, on which IT assets, what is needed, when and where throughout scientists’ entire project lifecycles • Using these data to characterize mission IT workflows, developing use case scenarios and models to express general characteristics • Recommending refinements to NASA’s IT system to reflect evolving needs of missions for IT
Objectives • Ensure IT assets are available when needed • Optimize investments in IT assets for mission purposes • Balance overall performance in mission computing and analysis • Identify and address evolving needs of missions for IT • Develop processes and templates to assess computing needs throughout NASA • Go beyond “more, bigger, faster” to understand system-wide needs • Determine what is common across mission groups, as well as what is unique for each group Our primary working assumption is that the most valuable resource is scientists’ and engineers’ time, whatever we can do to improve their effectiveness will pay off in improved mission results
Proving and Scaling the Process • NASA Mission Communities are very large, highly unlikely a few individuals could interview them all • Therefore, we are working on interview questions and templates that could be used by others, to parallelize this process • In order to prove this process, we decided to start small with a focus group, the SMD Earth Science Modeling (ESM) Community • We began in June with preliminary work: • Development of draft interview questions • Defining important terms for science IT • Laying foundation to describe science workflows • Interviewing scientists throughout July and August At this point we are prepared to discuss our methodology and some early findings, and deliver a full set of results at the December American Geophysical Union meeting
Preliminary WorkDraft Interview Questions Categories of Topics Discussed with ESM Scientists: • Work Patterns and Behaviors • How, when, and where do scientists receive, generate, transport, analyze, store, and visualize data, how can science be done better • Computing Job Characteristics • What their computing and analysis jobs need in order to run, how could jobs be run more effectively, how can systems be better utilized by them • Environments and Support • Enhancements, Evolution, and Revolution • What can be done from an IT perspective to improve science • Computing, Analysis, Visualization Characterizations • What do the various systems that they use look like Lists of questions associated with these topics are provided in Appendix A.
Preliminary WorkDefining Important Terms High-End Computing The application of specialized capabilities and large amounts of processing power (tens of thousands of processors) per job to solve the largest and most complex problems of that period (e.g. Grand Challenge class problems) Mid-Range Computing Large computing clusters (on the order of thousands of nodes) that may have some of the specialized capabilities of high-end computing, but are generally smaller and less specialized. As such, they operate earlier in the project lifecycle, on subsets of Grand Challenge class problems Low-End Computing Departmental, project, and individual computing resources, ranging from a few to tens of processors, with little to no specialization
Preliminary WorkDescribing Science Modeling Workflows • Based on coding models from Biegel and Kepner • Used as a starting point to begin discussions with scientists • For example, model for running codes: Initial Run Scale and Optimize Full Run
Interviewed Scientists from NASA Earth Science Modeling Community LaRC • Atmospheric Sciences GSFC • Atmospheric Chemistry • Goddard Modeling and Assimilation Office (GMAO) • Global Modeling Initiative (GMI) • NASA Unified Weather Research and Forecasting (NU-WRF) ARC • NASA Earth Exchange (NEX) JPL • Ice Sheet System Model (ISSM) • Estimating the Circulation and Climate of the Ocean (ECCO) • Carbon Monitoring • Climate A list of scientists interviewed to date is provided in Appendix B
Whiteboard Sessions Using our preliminary work, had whiteboard sessions with individuals and groups, in which they outlined their science, workflows, and IT assets used Example: NASA Unified Weather Research and Forecasting (NU-WRF) LIS: Land Information System GOCART: Goddard Chemistry Aerosol Radiation and Transport SDSU: Satellite Data Simulator Unit RAD: Radiation
Use-Case Scenarios For each group interviewed, developed a Use-Case Scenario that describes their: • Work and science • Workflows • IT assets used • Issues • Desirables and potential impacts • Future IT needs
Describing Workflows When possible, steps in the process of developing science data are captured, along with any IT-related information Example: Cloud Modeling at LaRC NCAR: National Center for Atmospheric Research ASDC: Atmospheric Sciences Data Center GrADS: Grid Analysis and Display System C3M: CERES, CALIPSO, CloudSat, MODIS
Process is Providing Valuable Insight • Found varying degrees of complexity in Earth Science Models, and corresponding implications to IT • Generalizing Earth Science Modeling and Analysis • Characterizes computing, analysis, visualization • Identifies key interfaces within and between components, where transport services apply • These may be strategic locations from an IT architecture perspective • Also need to capture unique characteristics • There are a number of interesting topics that are emerging • Science data • Processing queues • Security • Support to scientists • Environments Need a much larger sample size
Found Varying Degrees of Complexity in Earth Science Models Increasingly complex science models stress the performance levels of underlying IT infrastructure LIS: Land Information System GOCART: Goddard Chemistry Aerosol Radiation and Transport SDSU: Satellite Data Simulator Unit RAD: Radiation
Generalizing Earth Science Modeling and Analysis Earth Science Model Processing and Analysis
There are a Number of Interesting Topics that are Emerging • Science data • Growth in science data set sizes is outpacing our transport capabilities • Data set sizes approaching PB • Possible architectural options include: co-locating resources with data sets, improving transport, or replicating data sets • Processing queues • Queue wait times increase time to solution by up to 300% • Security • Security implementations reduce end-to-end performance, there is a need to better balance between the two • Support to scientists • Technical support across project lifecycle critical to scientists’ success • Environments (Processors, Storage, Communications, Libraries, Compilers) • As model complexities increase, its becoming harder for scientists to integrate across systems
Future Work Near-Term – Completing Assessment of Earth Science Modeling • Gather and derive more data to improve performance of NASA IT assets to science communities • Round out scope of NASA ESM Community • Other SMD ESM groups (Goddard Institute for Space Science) • ESM at other agencies (Los Alamos and Oak Ridge National Labs), for calibration purposes • Compare and contrast results Results from this work will be presented at the American Geophysical Union meeting in December
Future Work Longer Term – Expanding Scope to Other Missions At end of August we will begin rolling out the assessment process throughout other directorates in NASA This will consist of: • Identifying points of contact in each mission to lead this effort • Conducting workshops to introduce process templates and questions, and teach the assessment process • Working with leads to facilitate applying process to their subject groups • Assisting with the processing of data and synthesizing into agency results
Appendix A – List of Interview Questions Work Patterns and Behaviors: • Work breakdown – categorizing major work components • Time to solution (intermediate and final) • How, when, and where do they receive, generate, process, store, and display data • Interactions with computing facilities and personnel • Other, non-science skills needed GSFC Job Characteristics: • Numerical schemes • Degree of parallelism • Interprocessor latency • Internal and external communications • Characterization of data sets • Levels of precision • Provenance and logging • Use of metadata • Locality of data • Level of data sensitivity • Mission-critical jobs • Time-critical jobs
Appendix A – List of Interview Questions Resource Utilization: • Scheduling of computing resources • Degree of interactivity with resources • Billing units and usage • Issues with using computing resources Enhancements and Evolution: • Possible ways to enhance work • Potential value of enhancements • Technologies of interest and why • Desired computing platforms and why Software, Tools, and Utilities: • Languages and compilers • Tools and utilities • COTS software • Specialized software • Software engineering needed • Customizations to any of the above Visualization Characterization: • Locality of visualization resources • Real-time visualization
Appendix A – List of Interview Questions Computing Characterization: • Interconnect type • Interprocessor communications speed • Average/Peak processing loads • Current computing hardware types • Architectures: shared or single memory • Number of processors, cores • Types of processors • Per-processor and total memory • Local cache • Local storage size and type Computing Characterization: • Locality of computing resources • Specialized hardware • Throughput • Latency/Delay • Reliability/Operational Availability • Job management Storage Characterization: • Locality of storage resources • Quantity and types of storage
Appendix B – List of Scientists Interviewed to Date LaRC • Atmospheric Sciences • Kuan-Man Xu • Anning Cheng GSFC • NU-WRF • Robert Burns • Jim Geiger • Joe Santonello • Sujay Kumar • Toshi Matsui GSFC • Atmospheric Chemistry • Qian Tan • GMAO • Michele Rienecker • Ron Gelaro • Arlindo de Silva • Bill Putman ARC • NEX • Rama Nemani
Appendix B – List of Scientists Interviewed to Date JPL • ISSM • Eric Larour • Mathieu Morlighem • Helene Seroussi • ECCO • Ichiro Fukumori • Benny Cheng • Ou Wang JPL • Carbon Monitoring • Kevin Bowman • Robert Ferraro • Frank Li