1 / 32

Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts i

Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research. Edward Seidel Senior Vice President, Research and Innovation Skolkovo Institute of Science and Technology.

osgood
Download Presentation

Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts i

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President, Research and Innovation Skolkovo Institute of Science and Technology

  2. Part 1: We are in a period of unprecedented change in Science and Society…the crises & opportunities this creates

  3. Profound Transformation of ScienceCollision of Two Black Holes 1972: Hawking. 1 person, no computer 50 KB 1998: 15 people, NCSA Origin, 50GB 1994: 10 people, NCSA Cray Y-MP, 50MB 3

  4. Community Einstein Toolkit “Einstein Toolkit : open software for astrophysics to enable new science, facilitate interdisciplinary research and use emerging petascale computers and advanced CI.” • Consortium: 92 members, 46 sites, 15 countries • Whole consortium engaged in directions, support, development • Simulation: Luciano Rezzolla, Max Planck InstitutfürGravitationsphysik (AEI) Many groups can do this: field explodes! Major triumph of Computational Science---solve EEs! Community + software + algorithms + hardware + …

  5. New Frontiers: Relativistic Matter • Nuclear equations of state • Collab. with astrophysicists • General relativistic magnetohydrodynamics • Some groups have ideal MHD • Radiation Transport (neutrinos/photons) • Expensive and complicated! • Requires opacities/emissivities • Chemical reactions (thermonuclear, chemical) • SN community! • Computation: • Multiphysics!! • GRMHD: petascale problem • Radiation transport beyond this Zoom in to just this part: post BH formation and evolution of jet Schnetter et al, PetaScale Computing: Algorithms and Applications, 2007

  6. Post BH Formation/Evolution of Jet Multiphyiscs framework needed for fluids in astrophysics and porous media… • Multiphysics • GR, Neutrinos, MHD, Nuclear EOS • Computer Science • 10 level AMR, optimized for Blue Waters • Science • 200s evolution, analyze it all • Blue Waters • 6M Pflop@1PF sustained = 70days! Schnetter, et al Rezzolla, et al

  7. Theory and Observation of Universe New era of science after a century! Data- and compute-dominated gravitational wave astronomy! • Gravitational Waves! • Complex problems in relativistic astrophysics • Relativity, hydrodynamics, nuclear physics, radiation, neutrinos, magnetic fields: globally distributed collab! • Observe (PB), compute (PF) signals • Gravity and general relativity are transformed • 4 centuries of small science, small data culture • 2-3 decades of radical change in both data (factors of 1000 per~5 years) and collaboration LIGO/VIRGO/GEO 7

  8. US Council on Competitiveness • Ping Golf • Moved from workstation to Cray, now make prototype only at last stage of design • Too effective: had to simulate “less effective” design! • Proctor & Gamble • Pringles “flying” off manufacturing line causing significant lost product and revenue. Using CFD codes that Boeing uses, airflow over the Pringle modeled, design so Pringles did not “lift off”

  9. Part 1a: The Growth of Data “I’m still here…” But I’m your new baby big brother… Data Tsunami With millions of processors…

  10. Going Beyond a CommunityTransient & Multi-MessengerAstronomy Will require integration across disciplines, end-to-end • Astronomy 1500-2010 was passive. No longer! • New era: seeing events as they occur • Here now • ALMA, EVLA in radio • Ice Cube neutrinos • On horizon • 24-42m optical? • LSST = SDSS (40TB) every night! • SKA = exabytes • Simulations integrate all physics • Data-intensive = compute-intensive Communities need to share data, software, knowledge, in real time

  11. News Flash! NYT 6/3/13: Drug side effects discovered by mining web logs: paroxetine + pravastatin = high blood sugar! Big Data vs The Long Tail of Science How do we harness the power of this long tail? • Many “Big Data” projects are “special” • Tend to be highly organized, have singular sources of data, professionally curated, a lot attention paid to them • What about the “Long Tail” (the other 99%)? • Thousands of biologists sequencing communities of organisms • Thousands of chemist and materials scientists developing a “materials genome” • Millions of people “Tweeting”… • Characteristics: • Heterogeneous, perhaps hand generated • Not curated, reused, served, etc…

  12. Grand Challenge Communities Combine it All...Where is it going to go? Same CI useful for black holes, hurricanes 12

  13. Grand Challenge Communities for Complex Problems Social, behavioral and economic sciences will be critical in helping us understand these issues… • Require many disciplines, all scales of collaborations • Individuals, groups, teams, communities • MultiscaleCollaborations: Beyond teams • Are dynamic and highly multidisciplinary • Time domain astronomy, emergency forecasting, metagenomincs, materials genome… • Drive sharing technologies and methodologies • Researchers collaborate, work by sharing data. Places requirements on eInfrastructre: • Software, networks, collaborative environments, data, sharing, computing, etc • Scientific culture, reproducibility, access, university structures • “Publications.” What is a modern publication? 13

  14. Scenarios like this in all fields NEON+GIS

  15. Framing the Challenge:Science and Society Transformed by Data We still think like this… • Modern science • Data- and compute-intensive • Integrative, multiscale • 4 centuries of constancy, 4 decades 109-12 change! • Multi-disciplinary Collaborations • Individuals (Galileo!) • Groups, teams, Grand Challenge Communities • Big Data + Long Tail • Sea of Data • Age of Observation Students take note! …But such radical change cannot be adequately addressed with (current) incremental approach!

  16. Part 2: Crises, Challenges, Opportunities Instruments & Facilities Computing Cyber No, we are not… Organizational structures Data Education Software End-to-end Networks

  17. Five Crises“CDSE” Community needs to address • Computing Technology • Multicore: processor is new transistor • Programming model, fault tolerance, etc • New models: clouds, grids, GPUs, … • Data, provenance, and visualization • How do we create “data scientists”? • What is an international data infrastructure? • Software treated as e-Infrastructure • Complex applications on coupled compute-data-networked environments, tools needed • Modern apps: 106+ lines, many groups contribute, take decades 17

  18. Five Crises • Organization for Multidisciplinary & Computational Science • “Universities must significantly change organizational structures: multidisciplinary & collaborative research are needed [for US] to remain competitive in global science” • “Itself a discipline, computational science advances all science…inadequate/outmoded structures within Federal government and the academy do not effectively support this critical multidisciplinary field” • Education • The CI environment is running away from us! • How do we develop a workforce to work effectively in this world? • How do universities transition? 18

  19. NSF Experts Study PCAST Digital Data Wired, Nature Industry Storage Networking Industry Association (SNIA) 100 Year Archive Requirements Survey Report “there is a pending crisis in archiving… we have to create long-term methods for preserving information, for making it available for analysis in the future.” 80% respondents: >50 yrs; 68% > 100 yrs Data Crisis: Information Big Bang Scientific Computing and Imaging Institute, University of Utah

  20. The Shift Towards a “Sea of Data”Implications How do we attribute credit for this new publication form? How are data peer reviewed? What is a publication in the modern data-rich world? What is a business model for OA? • Science & society are now data-dominated • Experiment, computation, theory • US mobile phone traffic exceeded 1 exabyte! • Classes of data • Collections, observations, experiments, simulations • Software • Publications • Totally new methodologies • Algorithms, mathematics, culture • Data become the medium for • Multidisciplinarity, communication, publication, science, economic development… Fundamental questions become focused around data: What to curate, how to remove boundaries? How to incentivize sharing? IP?

  21. Part 2a: Recommendations

  22. ACCI Task Force Reports • Final recommendations presented to the NSF Advisory Committee on Cyberinfrastructure Dec 2010 • More than 25 workshops and Birds of a Feather sessions, 1300 people involved • Final reports on-line Grand Challenge Data & Viz Software Campus Bridging Learning HPC “Permanent programmatic activities in Computational and Data-Enabled Science & Engineering(CDS&E) should be established within NSF.” Grand Challenges Task Force “NSF should establish processes to collect community requirements and plan long-term software roadmaps.” Software Task Force “Higher education should adopt criteria for tenure and promotion that reward…the production of digital artifacts of scholarship. Such artifacts include widely used data sets, scholarly services delivered online, and software.” Campus Bridging Task Force

  23. Recommendation of NSF Advisory Committee on CyberinfrastructureACCI "The National Science Foundation should create a program in Computational and Data-Enabled Science and Engineering (CDS&E), based in and coordinated by the NSF Office of Cyberinfrastructure. The new program should be collaborative with relevant disciplinary programs in other NSF directorates and offices."

  24. Part 3: Universities attempt to respond We have to do all this and revolutionize the state/national economy?

  25. Skoltech: Example of a 21st Century University in the Making

  26. Integrated data, compute, instrumentation infrastructure and policy under development for • Interdisciplinary research • Accelerating discovery • Economic development Skoltech at a Glance • A unique Russian institution in international context • This decade: a community of 200 faculty, 300 post-docs, 1200 graduate students • Focused on science, engineering and technology • Addressing problems and issues in IT, Energy, Biomedicine, Space and Nuclear • Interdisciplinary by design; no departments • 15 centers organized around complex problems • With strong programs in support of innovation and entrepreneurship • Creating a culture of innovation in every student, professor, staff member • Important part of the Skolkovo innovation ecosystem

  27. Part 3a: You can help lead this revolution Kathryn Gray

  28. Campus Data Campus Track 2 Campus Campus Campus Campus Campus Campus Track 2 Track 2 Data Modern Research & Education Ecosystem Education Crisis: I need all of this to start to solve my problem! Software Blue Waters Software Data Data XSEDE

  29. The Opportunity (US picture)! • Now have emerging national Integrated, High Performance ResearchArchitecture • Blue Waters and beyond towards exascale: high end • Extraordinary science continues lead at cutting edge • Traditional and novel large data applications • Few places can house, field, or drive such a facility • XSEDE architecture can connect… • Campus Bridging: campus to national CI… • Campus Assets: MRI, Instruments, DNA sequencers… • Facilities: Supercomputers, telescopes, accelerators, light sources, NEON … • ”More silicon than Steel” • Networks: end-to-end connectivity • Where are those optical network apps?

  30. Much to do to build CDSE on this Background: address the “5 Crises” • Education • Many new opportunities and challenges • CSE already has its struggles • Now data: what is a “data scientist”? • CDSE emerges • Data opportunities for education and citizen science • Faculty development, curriculum development • Needed on every campus • Talk to NSF, DOE, EC, your national agencies • Recommendations of ACCI, MPSAC, etc • New programs needed: See NSF CDS&E, CI TraCS, CAREER, “LWD”, etc • You can help make this happen

  31. Key Messages • Astounding rate of change of the “Triple Helix” of Research, Education, and Innovation • Computing and Data radically change methods • Culture of collaboration around complex problems • These create many crises and opportunities • From technology to methodology to culture… • Deep integration required for science • Emergence of Computational and Data-enabled Science and Engineering as a discipline and your role! • A key part of the paradigm shift

  32. & Data

More Related