1 / 21

The Grid Observatory: goals and challenges

The Grid Observatory: goals and challenges. C. Germain-Renaud (CNRS/LRI & LAL) EGEE’07 Conference Budapest, Hungary 1-5 October 2007. Overview. NA4 cluster in EGEE-III proposal

kim-spears
Download Presentation

The Grid Observatory: goals and challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Grid Observatory: goals and challenges C. Germain-Renaud (CNRS/LRI & LAL) EGEE’07 Conference Budapest, Hungary 1-5 October 2007

  2. Overview • NA4 cluster in EGEE-III proposal • Integrate the collection of data on the behaviour of the EGEE grid and users with the development of models and of an ontology for the domain knowledge Application Track - Grid Observatory

  3. Some immediate questions • Ressource allocation • Performance of the gLite scheduling hierarchy • Published waiting time • Reactive grids – Everybody's grid • Dimensioning • Patterns and trends in requests and usage • Anticipate peaks • On-line fault management • Detection • Diagnosis • Prevention Application Track - Grid Observatory

  4. The big picture • Considering current technologies, we expect that the total number of device administrators will exceed 220 millions by 2010 – Gartner June 2001 • No more Moore’s Law free lunch: much more complex software & applications • The Virtual Organization concept creates common goods Application Track - Grid Observatory

  5. Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003 • Self-*: configuration, optimization, healing, protection • Of open non steady state dynamic systems Application Track - Grid Observatory

  6. Autonomic Computing Computing systems that manage themselves in accordance with high-level objectives from humans. Kephart & Chess A vision of Autonomic Computing, IEEE Computer 2003 • Self-*: configuration, optimization, healing, protection • Of open non steady state dynamic systems • Academic and industry involved Application Track - Grid Observatory

  7. execute analyze plan knowledge monitor Autonomic Grids • Statistical analysis • Data mining • Machine learning DATA REQUIRED Application Track - Grid Observatory

  8. Data Collection and Publication • Acquisition, consolidation, long-term conservation of traces of EGEE activities • Permanent storage of reliable, exhaustive, filtered information • Exhaustive: added value in snapshots of the inputs and grid state e.g. workload and available services during a relevant time range • Filtered: from operational to structured L&B schema No join ! Application Track - Grid Observatory

  9. Data Collection and Publication • Acquisition, consolidation, long-term conservation of traces of EGEE activities • Permanent storage of reliable, exhaustive, filtered information: from operational to structured • No monitoring development: rich ecosystem of sources, with very different scopes, deployment and institutional status • Centralized Application Track - Grid Observatory CIC tools (GOCDB, SAM, SFT,…), core gLite (L&B, BDII,…) sites (Maui/PBS logs) gLite integrators (R-GMA, Job Provenance) experience integrators (DashBoard) external software (MonaLisa)

  10. Data Collection and Publication • Acquisition, consolidation, long-term conservation of traces of EGEE activities • Permanent storage of reliable, exhaustive, filtered information: from operational to structured • No monitoring development: rich ecosystem of sources, with very different scopes, deployment and institutional status • The major challenge is exhaustive • Some data are outside the scope: external traffic on shared resources • Inside the scope, we need snapshots of the grid state and inputs • Privacy related legal constraints • Scientific usage will help • Interaction with EGI • Long-term: privacy-preserving data mining Application Track - Grid Observatory

  11. Data Collection and Publication • Publication service: navigation and querying • Integration of independent sources • Indexing along the needs of the users communities • Scheduling: ongoing work with CoreGrid • Jobs: ongoing work with KDUbik • Ontology • The Glue Information Model: an ontology of the resources • Concepts for the grid dynamics e.g. job lifecycle or users relations • Expert concepts as prior knowledge of non-trivial correlations: workflows, failure modes,… Job Resource Application Track - Grid Observatory

  12. Models • Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality • Likely to be similar to IP traffic: many short, and a significant number of long, at all scales • Long range dependencies Application Track - Grid Observatory

  13. Models • Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality • Likely to be similar to IP traffic: many short, and a significant number of long, at all scales • Long range dependencies • Characterizations of middleware-dependant metrics e.g. queuing delays, overhead, SE load Application Track - Grid Observatory

  14. Models • Intrinsic characterizations of «grid traffic»: (distribution of) e.g. job arrival rate, running time, application data locality • Likely to be similar to IP traffic: many short, and a significant number of long, at all scales • Long range dependencies • Characterizations of middleware-dependant metrics e.g. queuing delays, SE load • Inferenceof models for middleware components and applications, users and usage profiles, users interactions Application Track - Grid Observatory

  15. Autonomic dependability • On-line failure detectionand anticipation • Passive vs Active probing : a lot of information is available from user work • Black-box • On-line statistics from « similar » actions (executions, data access, middleware modules) Application Track - Grid Observatory

  16. Evaluation • Assessing performance at the grid scale is a challenge • Need a snapshot of the inputs and grid state e.g. workload and available services during a relevant time range • Classical optimization does not scale • Advanced optimization: anytime algorithms Application Track - Grid Observatory

  17. Abrupt changepoint detection • Page-Hinckley statistics • Time-sequential version of Wald’s statistics – also known as CUSUM • « intelligent threshold » test which minimizes the expected time before a change detection for a fixed false positive rate • Routine in quality control, clinical trials VO software bug Blackhole Application Track - Grid Observatory

  18. Autonomic dependability • On-line failure detectionand anticipation • Passive vs Active probing : a lot of information is available from user work • Black-box • On-line statistics from « similar » actions (executions, data access, middleware modules) • Supervised and unsupervised learning Application Track - Grid Observatory

  19. Mining the L&B logs Constructive induction Double clustering Application Track - Grid Observatory

  20. Autonomic dependability • On-line failure detectionand anticipation • Passive vs Active probing : a lot of information is available from user work • Black-box • On-line statistics from « similar » actions (executions, data access, middleware modules) • Supervised and unsupervised learning • Active probing • Adaptive on-line test selection for best coverage of possibly faulty components • Experience planning Application Track - Grid Observatory

  21. Goals & Challenges • Contributions to a quantitative approach of grid middleware and architecture, in the RISC sense • Operational impacts on EGEE: evaluation, autonomic dependability • Basic research in autonomic computing • Collaboration between EGEE and national research initiatives and other UE projects: DEMAIN, PASCAL KD-Ubiq, CoreGrid, and hopefully more • Adequate tradeoff between productivity and sustainability Application Track - Grid Observatory

More Related