1 / 12

Notes On the GAE

Notes On the GAE. Harvey B. Newman California Institute of Technology Grid-enabled Analysis Environment Workshop June 24, 2003. GAE Workshop Goals (1). “Getting Our Arms Around” the Grid-Enabled Analysis “Problem”

rfiske
Download Presentation

Notes On the GAE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Notes On the GAE Harvey B. Newman California Institute of TechnologyGrid-enabled Analysis Environment Workshop June 24, 2003

  2. GAE Workshop Goals (1) • “Getting Our Arms Around” the Grid-Enabled Analysis “Problem” • Review Existing Work Towards a GAE: Components, Interfaces, System Concepts • Review Client Analysis Tools; Consider How to Integrate Them • User Interfaces: What does the GAE Desktop Look Like ? (Different Flavors)  • Look At Requirements, Ideas for a GAE Architecture • A Vision of the System’s Goals and Workings • Attention to Strategy and Policy • Develop (Continue) a Program of Simulations of the System • For the Computing Model, and Defining the GAE • Essential for Developing a Feasible Vision; Developing Strategies, Solving Problems and Optimizing the System • With a Complementary Program of Prototyping

  3. GAE Collaboration DesktopExample • Four-screen Analysis Desktop 4 Flat Panels: 5120 X 1024; RH9 • Driven by a single server and single graphics card • Allows simultaneous work on: • Traditional analysis tools (e.g. ROOT) • Software development • Event displays (e.g. IGUANA) • MonALISA monitoring displays; Other “Grid Views” • Job-progress Views • Persistent collaboration (e.g. VRVS; shared windows) • Online event or detector monitoring • Web browsing, email

  4. GAE Workshop Goals (2) • Architectural Approaches: Choose A Feasible Direction • For example a Managed Services Architecture • Be Prepared to Learn by Doing; Simulating and Prototyping • Where to Start, and the Development Strategy • Existing and Missing Parts of the System [Layers; Concepts] • When to Adapt Existing Components, Or to Re-Build Them “from Scratch” • Manpower Available to Meet the Goals; Shortfalls • Allocation of Tasks; Including Generating a Plan • Linkage Between Analysis and Grid-Enabled Production • Planning for Closer Relationship with LCG, Trillium, and the Experiments’ starting Efforts in this area

  5. HENP Grids: Services Architecture Design for a Global System • Self Discovering, Cooperative • Registered Services, Lookup Services; self-describing • “Spaces” for Mobile Code and Parameters • Scalable and Robust • Multi-threaded: with a thread pool managing engine • Loosely Coupled: errors in a thread don’t stop the task • Stateful: System State as well as task state • Rich set of “problem” situations: implies Grid Views, and User/System Dialogues on what to do • For Example: Raise Priority (Burn Quota); or Redirect Work • Eventually may be increasingly automated as we scale up and gain experience • Managed; to deal with a Complex Execution Environment • Real time higher level supervisory services monitor, track, optimize and Revive/Restart services as needed • Policy and strategy-driven; Self-Evaluating and Optimizing • Investable with increasing intelligence • Agent Based; Evolutionary Learning Algorithms

  6. Getting Started Towards a Workable GAE (1) • Work on Computing Model (Essential) in Parallel • Focus on a Few Scenarios for Doing Analysis • “Grid Enabled PROOF” [in CMS; in ATLAS] • Start with Existing Analysis Applications: Can they be recast in GAE Form ? • Make Some Starting Assumptions • Need some simple picture of persistency • Supplementary considerations: • Multiuser situation (e.g. with avatars; then Analysis Challenges) • Coming to a few Either/Or Decisions • List of rudimentary analysis tools, and way of working • “External” to the application considerations: • Job planning • Key role of query estimation (not only beforehand) • Transparency versus tracking

  7. Getting Started Towards a Workable GAE (2) • Session or Sessions on the Desktop • There Modes of Working; All in the GAE • Immediate (within a few seconds) • In the background (seconds to a few minutes) • Spawn batch job or jobs (minutes to hours) • Decisions and tradeoffs • Lay out the strategies and consequences (time, quota etc) • Present Choices • Monitor progress or get “alarms” and be prepared to re-strategize

  8. Getting Started Towards a Workable GAE (3) • Smart Caching: Or Methods, of Data, or Time to Process Info. • Intelligence in the system does not only mean problem solving • Need to apply intelligence/experience to progressively improve system performance • Time-to-completion estimation: process a small amount of data to get a realistic first estimate.

  9. 3 Slides About Building a Computing Model & the GAE System • These Slides Focus on Simulation/Prototyping, as an Integral part of designing and building distributed systems for the GAE, and the Grid-Enabled Production Environment (GPE) as well.

  10. Building a Computing Modeland an Analysis Strategy (I) • Generate a Blueprint: A “Computing Model” • Tasks  Workload, Facilities, Priorities & GOALS • Persistency; Modes of Accessing Data (e.g. Object Collections) • What runs where; when to redirect • The User’s Working Environment • What is normal (managing expectations) ? • Guidelines for dealing with problems: based on which information ? • Performance and problem reporting/tracking/handling ? • Known Problems: Strategies to deal with those • Set up, code a Simulation of the Model • Develop mechanisms and sub-models as needed • Set up prototypes to measure the performance parameters where not already known to sufficient precision

  11. Building a Computing Modeland an Analysis Strategy (II) • Run simulations (avatars for “actors”; agents; tasks; mechanisms) • Analyze and evaluate performance • General performance (throughput; turnaround) • Ensure “all” work is done: learn how to do this: within a reasonable time; compatible with the Collaboration’s guidelines • Vary Model to Improve Performance • Deal with bottlenecks and other problems • New strategies and/or mechanisms to manage workflow • Represent key features and behaviors, for example: • Responses to Link or Site failures • User input to redirect data or jobs • Monitoring information gathering • Monitoring and management agent actions and behaviors in a variety of situations • Validate the Model • Using Dedicated setups • Using Data Challenges (measure, evaluate, compare; fix key items) • Learn of new factors and/or behaviors to take into account

  12. Building a Computing Modeland an Analysis Strategy (III) • MAJOR Milestone: Obtain a first picture of a Model that Seems to Work • This may or may not involve changes in the computing resource requirements-estimates; or Collaboration policies and expectations • It is hard to estimate how long it will take to reach this milestone [most experiments until now have reached it after the start of data taking] • Evolve the Model to • Distinguish what works and what does not • Incorporate evolving site hardware and network performance • Progressively incorporate new and “better” strategies, to improve throughput and/or turnarounds, or fix critical problems • Take into account experience with the actual software-system components as they develop • In parallel with the Model evolution keep developing the overall data analysis + Grid + monitoring “system”; represent it in the simulation • And the associated strategies

More Related