1 / 35

The Grid – The Next 10 Years

The Grid – The Next 10 Years. CCGS, September 2002. Dr. Francine Berman Director, NPACI and SDSC Professor, Department of Computer Science and Engineering, UCSD. Grid Computing in the News. Has the Grid been oversold?. Today’s Presentation. Grids Today -- Where are we now? Trends

afya
Download Presentation

The Grid – The Next 10 Years

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Grid – The Next 10 Years CCGS, September 2002 Dr. Francine Berman Director, NPACI and SDSC Professor, Department of Computer Science and Engineering, UCSD

  2. Grid Computing in the News

  3. Has the Grid been oversold?

  4. Today’s Presentation • Grids Today -- Where are we now? • Trends • Science and Technology • Applications • State of the Art • Has the Grid been oversold? • The Grid – the next 10 years • New application paradigms • New devices • Policy and social dynamics • New research

  5. Science and Technology Trends • Proliferation of resources • Everyone has computers • Multiple IP addresses per person • Increasing Complexity • Multi-scale • Multi-disciplinary • Immense amounts of data • Heterogeneity The Arpanet1969 Internet2002

  6. Source: Dave Turek, IBM Science and Technology Trends • Coordination/collaboration is default mode of interaction • The Internet • Globalization, virtualization • Open source movement • At the largest scales, heterogeneity is a fact of life

  7. Application Trends: Distribution • Walmart Inventory Control • Satellite technology used to track every item • Bar code information sent to remote data centers to update inventory database and cash flow estimates • Satellite networking used to coordinate vast operations • Inventory adjusted in real time to avoid shortages and predictdemand • Data management,prediction, real-time,wide-area synchronization

  8. Distributed Entertainment • Everquest • 45 communal “world servers” (26 high-end PCs per server) supporting 430,000 players • Real-time interaction, individualized database management, back channel communication between players • Data management adapted to span both client PC and server to mitigate communication delays • Game masters interact with players for real-time game management

  9. Distributed Data Mining • SETI@home • 3.8M users in 226 countries • 1200 CPU years/day • 38 TeraFlops sustained (Japanese Earth Simulator is 40 TF peak) • 1.7 ZETAflop over last 3 years (10^21 trillion calculations) • Highly heterogeneous: >77 different processor types

  10. Distributed Applications on the Grid • The Grid should accelerate progress • Many applications currently developed as stand-alone entities • Availability of Grid services will allow designers to build on existing infrastructure and evolving technologies • Applications developed for the Grid will contribute to community infrastructure, standards, progress • Stability/performance of multiple applications must be addressed • Grid applications will necessitate policies for resource sharing

  11. NPACI, TeraGrid Grid Applications Applications NPACI Grid Middleware User-focused and targeted grid middleware, tools, and services Common Infrastructure layer (NMI, GGF standards, OGSA etc.) Resources Unifying the Infrastructure:A Community Grid Model • Common agreement on interfaces, service architecture, standards

  12. Grids 2002 – State of the Art DISCOM SinRG APGrid IPG …

  13. How are we using the Grid? To manage computation To manage data To provide access to resources

  14. The Human Genome Project provided a “parts list” for the human species We are beginning to understand the parts We don’t know what all the parts do We don’t know how all the parts fit together Ultimate goal: to understand how the parts fit together to create life’s processes for all species Each species’ genome is made up of thousands of genes Each gene holds the recipe for a single protein Proteins provide the what and how of virtually all of life’s processes A first step towards our ultimate goal is to understand the proteins associated with each known genome – to create an Encyclopedia of Life Encyclopedia of Life – State-of-the-Art for Modern Grid applications

  15. The Encyclopedia of Life Project • For each of 800+ complete or partial publicly available genomes, execute a computational pipeline to define protein annotation and model 3D structure • This will give information on the what and how of the proteins associated with each genome • Code for each genome is pipeline of parallel filtering steps • Protein information stored in a web-accessible database • EOL will be intensively data-mined by scientists for next generation of comp bio and genomics results

  16. EOL Computational Pipeline ~800 genomes with 10k-20k ORFs (Open Reading Frame) Genome protein sequences sequence info structure info Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG) NR, PFAM SCOP, PDB 104 entries Building FOLDLIB: PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP 90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30) Create PSI-BLAST profiles for Protein sequences Structural assignment of domains by PSI-BLAST on FOLDLIB Only sequences w/out A-prediction Structural assignment of domains by 123D on FOLDLIB Only sequences w/out A-prediction Functional assignment by PFAM, NR, PSIPred assignments Domain location prediction by sequence FOLDLIB Store assigned regions in the DB

  17. “Almost everything in the body … is either made of proteins or made by them.” Matt Ridley, “Genome: The Autobiography of a Species in 23 Chapters” • Types of Questions which can be addressed by EOL • Is protein X found in anthrax? • Is protein X a drug target, that is, does it exist predominantly in pathogenic bacteria or is it found in eukaryotes also? • Has caspase-1 (a protein involved in cell death and aging) been identified in any plants, if so what species and do the proposed protein structures look similar? • Give me all available information on caspase-1 Arabidopsis annotation joint work with SDSC and Ceres We’ve already started – Annotation of Arabidopsis thaliana Proteins

  18. TeraGrid: State-of-the-Art for High-end Grids TeraGrid Team:Berman (SDSC), Catlett (ANL), Foster (ANL), Messina (Caltech), Reed (NCSA), Stevens (ANL) … and a cast of hundreds Pushing the envelope in Grid resources • Over 13.6 trillion calculations per second • Over 600 trillion bytes of immediately accessible data • 40 gigabit per second network speed TeraGrid will provide new limits for cluster computing and new paradigms for data-oriented computing Critical for disaster response, genomics, environmental modeling, …

  19. Laboratory clusters Linux based Globus enabled Roll-your-own data Performance models tuned to low end High-end distributed clusters Linux based Globus enabled Basic and Advanced data services Reasonably direct path from low-end performance models to high end 256p HP X-Class 128p HP V2500 Caltech: Data collection analysis 92p IA-32 HPSS HPSS ANL: Visualization SDSC: Data-Intensive Myrinet HPSS Myrinet 1176p IBM SP Blue Horizon Sun E10K NCSA: Compute-Intensive From the laboratory to the high end

  20. Measures of Success • Use a single node on TeraGrid • Portals, SW, scheduling should allow access to designated individual resources • Use as a wide-area cluster computer • Use multiple designated resources of the same type for a single computation • Use as a simple grid • Use multiple resources of different types in a staged or concurrent computation • Use as a full grid • Use multiple nodes as an ensemble via advanced SW environment

  21. The Grid is more than just a development and integration project • E.g.: TeraGrid was developed as a vision for the future, which needs to accomplished • Within a short time frame (3 years) • Using current and emerging products • Leveraging current research • Targeting a current set of cutting edge applications • There are many questions not addressed by TeraGridand other projects that must be addressed to develop a usable and useful Grid information infrastructure

  22. We have barely scratched the surface on • Program development environments • Debugging, compiling, performance tuning • Fault tolerance • Modeling of dynamic, unpredictable environments • Grid market economy (allocation, accounting, cost models) • New Grid programming paradigms • Extreme heterogeneity (sensors, supercomputers, cell phones, cars) etc.

  23. Has the Grid been Grid oversold? The promise of the Grid has been not been oversold but the difficulty of developing the requisite Grid infrastructure has been underestimated.

  24. Picture ofearthquakeand bridge Grids: The Next 10 Years Picture ofdigital sky

  25. Ultimate Goal: A useful, usable, stable Grid that is • High-capacity(rich in resources) • High capability(rich in options) • Evolutionary(able to adapt to new technologies and uses) • Persistent(promoting stable infra + knowledgeable workforce) • Scalable(growth must be a part of the design) • Adequately supported(both in funding and commitment) • Morecooperativethan competitive • Useful, able to support/promotenew science • Usable(accessible, robust, easy-to-use)

  26. Applications are key to the Grid’s success • Applications will use whatever parts of the infrastructure that can really deliver • Apps developers are willing to be dedicated and creative but it has to be worth their while • Goal is for Grid infrastructure to some day be as natural a part of the picture as the OS • The Grid will be considered “oversold” if the only people who can productively use it are the techies …

  27. Next generation Grid applications: Adaptive applications (run where you can find resources satisfying criteria X) Real-time applications (do something right now) Coordinated applications (dynamic programming, branch and bound) Poly-applications (choice of resources for different components) We still can’t “throw any application at the grid” and have SW determine where and how it will run Focus on New Application Paradigms for the Grid

  28. The GrADS (Grid Application Development Software) Project Design and development of a Grid program development and execution environment Grid-friendly libraries, compilers, schedulers, performance tools Program performance through adaptation Contract-based grid performance economy Performance feedback Perf problem Software components Realtime perf monitor Scheduler/ Service Negotiator Grid runtime System (Globus) Config. object program Source appli- cation whole program compiler P S E negotiation Dynamic optimizer libraries Focus on Performance: Grid Programming Environments

  29. Data from sensors Focus on Data: A “Killer App” for the Grid • Over the next decade, data will come from everywhere • Scientific instruments • Experiments • Sensors and sensornets • New devices (personal digital devices, computer-enabled clothing, cars, …) • And be used by everyone • Scientists • Consumers • Educators • General public • SW environment will need to support unprecedented diversity, globalization, integration, scale, and use Data from instruments Data from simulations Data from analysis

  30. How do we combine data, knowledgeand information management with simulation and modeling? Applications: Medical informatics, Biosciences, Ecoinformatics,… Visualization How do we represent data, information and knowledge to the user? Data Mining, Simulation Modeling, Analysis, Data Fusion How do we detect trends and relationships in data? Knowledge-Based Integration Advanced Query Processing How do we obtain usableinformation from data? Grid Storage Filesystems, Database Systems How do we collect, accessand organize data? How do we configure computer architectures to optimally support data-oriented computing? From Data to Information to Knowledge High speed networking Networked Storage (SAN) instruments sensornets Storage hardware SDSC Data and Knowledge Systems Program

  31. New devices PDAs, sensors, cars, clothes, smart dust,smart bandaids, … Wired and Wireless HPWREN, Roadnet(Hans-Werner Braun, Frank Vernon et al.) 45 Mbps between Mount Laguna telescope and SDSU, wireless access to Pala, Rincon, La Jolla Indian Reservations, etc. Roadnet expanding to waterways, etc. Next generation Grids will include new technologies

  32. ~ Online System HPSS HPSS HPSS HPSS HPSS Global Information Infrastructure:A Grid of Grids

  33. New devices Sensors Wireless Commonpolicies Grid Economy Global-area Networking NPACI, TeraGrid Grid Applications Grid Applications NPACI Grid Middleware User-focused and targeted grid middleware, tools, and services Common Infrastructure layer (NMI, GGF standards, OGSA etc.) Global Resources The Global Information Infrastructure must cross technical, political, social boundaries

  34. New Research Fault tolerance Compilers, performance prediction, scheduling Agent-based computing Location-independence Extreme heterogeneity Applications which push the envelope Applications with dependences Adaptivity, poly-algorithms, commercial applications Policy and economics for grid environments Sharing as a default mode of interaction Trust, policy, negotiation, payment Usability and performance Programming environments for the Grid, portals Adaptivity as the prevalent mode for performance We’ve made a great start but there is much farther to go Drivers Wanted: We should be developing a new generation of scientists, technologists and solutions to address the challenges of a Global Grid Infrastructure

  35. New devices Sensors Wireless Commonpolicies Grid Economy Global-area Networking NPACI, TeraGrid Grid Applications Grid Applications NPACI Grid Middleware User-focused grid middleware,tools, and services Common Infrastructure layer (NMI, GGF standards, OGSA etc.) Global Resources Thank You

More Related