350 likes | 481 Views
The Grid Observatory. Operated by L aboratoire de Recherche en Informatique Laboratoire de l’ Accélérateur Linéaire Imperial College London. With the support of France Grilles – French NGI of EGI EGI-Inspire Ile de France council (Software and Complex Systems programme )
E N D
Operated by • Laboratoire de Recherche en Informatique • Laboratoire de l’ AccélérateurLinéaire • Imperial College London
With the support of • France Grilles – French NGI of EGI • EGI-Inspire • Ile de France council (Software and Complex Systems programme) • INRIA – Saclay (ADT programme) • CNRS (PEPS programme) • University Paris Sud (MRM programme)
Production since October 2008 CCGrid 2011 6
Traces available through the portal CCGrid 2011 7
No grid certificate required CCGrid 2011 8
Torque CE Logging& bookkepping BDII IC RTM WMS SQL LDAP HTTP SFTP Incoming Anonymisation Upload Grid ObservatoryPortal DPM via HTTPs Storage Elements On top of EGI monitoring - anonymized
Lessons learned Sociology • Running a production system for usage by computer science and engineeringis nearly unchartered territory – we are a few explorators • Verified that 80% of the cost of Data Mining is in pre-processing
Lessons learned Technique • Build on existing monitoring tools • No fancy technology: the goal is usage, not the tool
The first barrier to improvedenergyefficiencyis the difficulty of collecting data on the energy usage of individualcomponents, and the lack of overall data collection
The GCO monitors energy usage ata large computing center, and publishesthemthroughthe GridObservatory.
A second barrierismaking the collecteddata usable, consistent and complete. GCO adopts an ontologicalapproachin order to rigorouslydefinethe semantics of the data and the context of their production.
The GRIF-LAL computing room The LAL Computing Room 240 machines, 2200+ cores, 500TB of storage. Mainly a Tier 2 in the EGI grid, but alsoincludes local services and the StratusLab Cloud testbed Accessible approximation of a data center
Sensors 1 minute samplingperiod
Source: http://www.netways.de/uploads/media/Werner_Fischer_-The-Power-Of-IPMI.pdf
Dealing non-stationarity • Adaptive clustering with application to fault diagnosisToward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming. SIGKDD'2009 • MDL segmentation applied to workloadDiscovering Piecewise Linear Models of Grid Workload.CCGRID 2010
Intelligibility How to build knowledge? • Supervised learning? No reference, too rare experts • Let’s build it on-line! Model-free policies e.g. Reinforcement Learning! • Unfortunately, tabula rasa policies and vanilla ML methods are too often defeated [Rish & Tesauro 2006). Exploration/exploitation tradeoff
Intelligibility • FaultmodelsDistributed Monitoring with Collaborative Prediction. CCGRID 2012 • Cloud managementCharacterizingE-Science File Access Behavior via Latent Dirichlet Allocation.UCC 2011