250 likes | 396 Views
Computer Science Research. Ian Foster University of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review 29-30 January 2003 Chicago. Computer Science Research. Introduction & Context (Ian Foster: 30 mins) Vision : Virtual data as e-science enabler
E N D
Computer Science Research Ian FosterUniversity of Chicago & Argonne National Laboratory foster@mcs.anl.gov GriPhyN NSF Project Review29-30 January 2003Chicago
Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov
PetaScale Virtual Data Grids (1) Production Team Research group Individual Investigator Interactive User Tools Request Planning & Scheduling Tools Request Execution & Virtual Data Tools Management Tools Resource Other Grid • Resource • Security and • Other Grid Security and Management • Management • Policy • Services Policy Services Services • Services • Services Services • PetaOps • Petabytes Transforms Distributed resources (code, storage, Raw datasource Performance computers, and network) Ian Foster, U.Chicago foster@mcs.anl.gov
Petascale Virtual Data Grids (2) Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science and GriPhyN Partner Physics Projects Requirements Prototyping & experiments Production Deployment Other linkages: • Work force • CS researchers • Industry Computer Science Research Virtual Data Toolkit Larger Science Community Techniques & software Tech Transfer Globus, Condor, NMI, EU DataGrid, PPDG Communities Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science Challenges (1) • Virtual data • Representation, discovery, & manipulation of workflows and associated data & programs • Planning • Mapping workflows in an efficient, policy-aware manner to distributed resources • Execution • Executing workflows, including data movements, reliably and efficiently • Performance • Monitoring aspects of system performance for scheduling & troubleshooting Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science Challenges (2) • Engage meaningfully with physics groups • Provide educational opportunities • Develop, package, deliver, and support quality software • Achieve outreach to groups outside partner physics experiments Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov
GriPhyN Computer Science Team • U.Chicago: Dumitrescu, Foster, Iamnitchi, Milligan, Ranganathan, Ripeanu, Voeckler, Wilde • USC/ISI: Deelman, Kesselman, Mehta, Patil, Singh, Vahi • NWU -> TAMU: Taylor, Yin • UCB: Franklin, Liu • UCSD: Marzullo, Moore, Zhang,Jagatheesan • UW-Madison: Alderman, Arpaci-Dusseau, Arpaci-Dusseau, Bailey, Bent, Kosar, Livny, Roy, Stanley, Thain • UF: Arbee, George, Jiang, Katageri, Ranka, Rodriguez • UT Brownsville: Campanelli,Morris,Zamora • LBNL: Shoshani Faculty/Staff, Student/Postdoc (underlined = present) Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science Research:How do We Work? • System architecture & virtual data toolkit as two overarching organizational mechanisms • Project activities all defined in relationship to these organizing principles: • Research: Explore new techniques to guide evolution of the system architecture and VDT • Development: Construct VDT software • Evaluation: Apply and evaluate VDT software and/or new techniques in context of application challenges Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science Research:How Are We Coordinated? • The activities of this large, multidisciplinary group are coordinated by frequent and multivalent communications • Face-to-face meetings in large & small groups • Formal and informal documents defining requirements, challenge problems, testbeds • Email, phone calls, videoconferences • Cooperation on challenge problems and technology and application demonstrations • Cooperation on software releases Ian Foster, U.Chicago foster@mcs.anl.gov
GriPhyN Architecture/VDTand CS Research Projects Virtual Data Ontologies (Zhao) Partial Queries (Liu, Franklin) Chimera Virtual Data System + Pegasus Planner Virtual data language design (Voeckler,Wilde) AI Planning (Deelman,Narang) Planning Virtual data language applns (Milligan, Zhao) Decentralized scheduling (Ranganathan) Prophesy (Taylor, Yin) DAGman Workflow Fault-tolerant master-worker (Marzullo) DAGman enhancements (UW team) Policy-aware scheduling (Dumitrescu) Globus Toolkit, Condor, Ganglia, Etc. Execution Scalable replica location service (UC, ISI team) NeST Storage mgmt (UW team) HP monitoring (George) VDT Research Ian Foster, U.Chicago foster@mcs.anl.gov
GriPhyN Arch/VDT—CS ResearchDegree of Coupling Already Underway Virtual Data Ontologies (Zhao) Partial Queries (Liu, Franklin) Pending Chimera Virtual Data System + Pegasus Planner Virtual data language design (Voeckler,Wilde) AI Planning (Deelman,Narang) Planning Virtual data language applns (Milligan, Zhao) Decentralized scheduling (Ranganathan) Prophesy (Taylor, Yin) DAGman Workflow Fault-tolerant master-worker (Marzullo) DAGman enhancements (UW team) Policy-aware scheduling (Dumitrescu) Globus Toolkit, Condor, Ganglia, Etc. Execution Scalable replica location service (UC, ISI team) NeST Storage mgmt (UW team) HP monitoring (George) VDT Research Ian Foster, U.Chicago foster@mcs.anl.gov
Examples of Technology Injection:Chimera R&D Timeline • Chimera-1 • Java code & class model • XML VDL • TR/DV model • Compound TRs • General Grid exec env • Optimized DB schema • Chimera-2 • Type model • Dataset catalog • Metadata • Hyperlinks • Instance tracking • Performance data • Chimera-3 • Knowledge repr. • Policy-driven planners • VD browsers, composers • … • Chimera-0 • Derivations only • Grid exec environment • (prototype) • PERL & PostgresQL TECH 2002 2003 2004 CMS & ATLAS analysis w/ROOT, CLARENS, JAS CMS analysis prototype w/ROOT Sloan cluster-finding science Bio Grid facility … APPS CMS event simulation prototyping Sloan cluster finding Sloan near-earth object ATLAS events-on- demand CMS official event simulation LIGO pulsar search Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov
Dissemination: Targets • Researchers and educators • Facilitate creation of new knowledge • Computer science research community • Contribute to knowledge • Engage community in solving our problems • Open source community • Contribute to open Grid technology base • Industry • Contribute to vibrant commercial technology Ian Foster, U.Chicago foster@mcs.anl.gov
Dissemination: Mechanisms • Software • VDT: adoption by LHC Computing Grid • Globus Toolkit and Condor systems • Publications and talks • XX papers, YY tech reports, ZZ talks • Workshops and meetings • E.g., “Data Derivation & Provenance”, Oct 02 • Community activities • E.g., advisory committees, GGF standards Ian Foster, U.Chicago foster@mcs.anl.gov
Representative Publications • Annis, J., Zhao, Y., Voeckler, J., Wilde, M., Kent, S., Foster, I., Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey. SC'2002, 2002. • Bent, J., Venkataramani, V., LeRoy, N., Roy, A., Stanley, J., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Livny, M., Flexibility, Manageability, and Performance in a Grid Storage Appliance, HPDC’11, 2002. • Deelman, E., Blackburn, K., Ehrens, P., Kesselman, C., Koranda, S., Lazzarini, A., Mehta, G., Meshkat, L., Pearlman, L., Blackburn, K. and Williams., R., GriPhyN and LIGO: Building a Virtual Data Grid for Gravitational Wave Scientists, HPDC’11, 2002. • Foster, I., Voeckler, J., Wilde, M., Zhao, Y., Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation, SSDBM, 2002. • Iamnitchi, A., Ripeanu, M., Foster, I., Locating Data in (Small-World?) Peer-to-Peer Scientific Collaborations. 1st Intl. Workshop on Peer-to-Peer Systems, 2002. • Raman, P., George, A., Radlinski, M., Subramaniyan, R., GEMS: Gossip-Enabled Monitoring Service for Heterogeneous Distributed Systems, Technical Report, UF, 2002. • Ranganathan, K. and Foster, I., Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications, HPDC’11, 2002. • Ripeanu, M., Foster, I., Iamnitchi, A. Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design. Internet Computing, 6 (1). 50-57. 2002. Ian Foster, U.Chicago foster@mcs.anl.gov
Computer Science Research • Introduction & Context (Ian Foster: 30 mins) • Vision : Virtual data as e-science enabler • Organization: Structure & interactions • Dissemination: Targets and mechanisms • The nature of future challenges • Computer science research • Virtual data (Mike Wilde: 15) • Scheduling, planning (Ewa Deelman: 15) • Execution (Mike Franklin: 15) • Performance (Valerie Taylor: 15) • Technology delivery (Miron Livny: 15) • Virtual Data Toolkit • Student presentations (60) Ian Foster, U.Chicago foster@mcs.anl.gov
The Nature of Future Challenges • GriPhyN R&D is proving very successful • In terms of “new ideas” • In terms of interest & adoption • Our major challenges as we move forward are to scale and sustain the effort • Research scope: virtual data => KR; planning, execution => x1000 larger; …; … • Software support: we need NMIx10! • Infrastructure & application support • See Atkins cyberinfrastructure report! Ian Foster, U.Chicago foster@mcs.anl.gov
Summary • CS has made significant contributions both to experiments and to knowledge, e.g. • Virtual data concepts and technologies • Scheduling in large-scale distributed systems • DAGman workflow management & execution • Scalable replica location services • VDT (& underlying Globus Toolkit & Condor systems) a good technology transfer vehicle • Adoption by major science projects • Adoption of Grid concepts within industry • Major challenge: exploiting opportunities Ian Foster, U.Chicago foster@mcs.anl.gov