Research partially supported by the ARL MSRC (GSA Contract GS00T99ALD0209) & Raytheon

Randy Schauer, Anupam Joshi A Probabilistic Approach to Distributed System Management • Why is the management of large scale distributed • systems a problem? • New High Performance Computing (HPC) clusters are already running • over 100 TeraFLOPS (Trillion Floating Point Operations per Second) on • a consistent basis, the PetaFLOP era is near. • Systems are becoming too large for system administrators to manage easily BlueGene/L 596 TFLOPS LLNL Livermore, CA • How can this problem be solved? • The system must be able to manage aspects of its configuration without • using a central image master, relying only on the knowledge of its peers • The system must be able to understand and evaluate its operating • environment to catch issues before they become catastrophic problems LNXI ATC (MJM) 53 TFLOPS ARL MSRC Aberdeen, MD • How can we determine the correct configuration in a distributed system? • Large clusters require the various commodity components to be tied together operationally through software • configurations, resulting in an inability to accurately model all possible configuration parameters • Based on the infinite possible configurations and optimal settings for differing environments, a statistical • relational learning method is the preferred inference mechanism, specifically Markov Logic Networks • Markov Logic Networks provide a first-order predicate knowledge base with a weight applied to each • formula, allowing for an initial set of conditions that capture the rules needed to make informed decisions • File Access Permissions • Comparisons required to ensure proper permissions for both • security and access include majority rule, most restrictive and • time-based differences • A statistical approach to solving this issue takes known factors • into account and weights them as appropriate, allowing us to • minimize uncertainty and determine the most valid option • Processor Heat Analysis • Determine if a processor is overheating by comparing the • temperatures being reported on the neighboring nodes and in • the nodes residing in the same rack location in neighboring racks • Nodes toward the middle and top tend to get hotter than nodes • toward the outside and bottom • So, what have we learned so far? • We understand that the ability to diagnose and recover from performance and configuration issues without • resorting to a centralized knowledge base is the next great stride in allowing systems to self-manage their • reliability and stability • Preliminary results show this is a good approach to using logic for probabilistic model-based diagnosis. • The results are promising, especially for such a radical change in the approach to system management, but • for production deployment, further refinement is necessary in order to obtain statistically significant results. Research partially supported by the ARL MSRC (GSA Contract GS00T99ALD0209) & Raytheon

Research partially supported by the ARL MSRC (GSA Contract GS00T99ALD0209) & Raytheon

Research partially supported by the ARL MSRC (GSA Contract GS00T99ALD0209) & Raytheon

Presentation Transcript

Leadership Forum - Providing Solutions

American Contract Law in a Comparative Perspective

Psychological Contract Breach & Violation

Supported Scaffold Safety

THE LMI TRAINING INSTITUTE

Contract Administration

MISTAKEN CONTRACTS

Design by Contract

Welcome

The Contract Labour (Regulation and Abolition) Act, 1970

Contracts

Designing Supply Contracts: Contract Type and Information Asymmetry

microPET Experiences with Small Animal PET Imaging

UCP 600 and Its Legal Aspects

Partially based on Prof . Vishwani D. Agrawal lecture VLSI Testing

Optoelectronics Packaging Research 2001

Deltek GCS Premier

Works Contract

DEVELOPING RESEARCH PROPOSALS

Breakout Session # 1504 Presenters: Allen L. Anderson; Attorney; Fees & Burgess, P.C.

Supported by:

Research partially supported by the ARL MSRC (GSA Contract GS00T99ALD0209) &amp; Raytheon

Research partially supported by the ARL MSRC (GSA Contract GS00T99ALD0209) &amp; Raytheon

Presentation Transcript

Leadership Forum - Providing Solutions

American Contract Law in a Comparative Perspective

Psychological Contract Breach &amp; Violation

Supported Scaffold Safety

THE LMI TRAINING INSTITUTE

Contract Administration

MISTAKEN CONTRACTS

Design by Contract

Welcome

The Contract Labour (Regulation and Abolition) Act, 1970

Contracts

Designing Supply Contracts: Contract Type and Information Asymmetry

microPET Experiences with Small Animal PET Imaging

UCP 600 and Its Legal Aspects

Partially based on Prof . Vishwani D. Agrawal lecture VLSI Testing

Optoelectronics Packaging Research 2001

Deltek GCS Premier

Works Contract

DEVELOPING RESEARCH PROPOSALS

Breakout Session # 1504 Presenters: Allen L. Anderson; Attorney; Fees &amp; Burgess, P.C.

Supported by:

Research partially supported by the ARL MSRC (GSA Contract GS00T99ALD0209) & Raytheon