130 likes | 418 Views
SAM Architecture. 15.05.2013. team. SAM Architecture. Architecture overview b asic architecture the whole picture Components overview Summary. SAM Architecture. SAM Architecture. SAM Architecture. ATP (Aggregated Topology Provider) polls information sources to gather topology
E N D
SAM Architecture 15.05.2013 team
SAM Architecture Contents 2/13 • Architecture overview • basic architecture • the whole picture • Components overview • Summary
SAM Architecture Basic architecture 3/13
SAM Architecture Architecture — the whole picture 4/13
SAM Architecture • ATP (Aggregated Topology Provider) • polls information sources to gather topology • services, flavours, sites, downtimes, vo-mappings, capacity, federations, tiers… • Web API • local and central deployments • updated twice per hour • Python + (MySQL, PL/SQL) Components overview 5/13
SAM Architecture • POEM (Profile Management) • stores profile definitions • synchronizes instances via poem_sync daemon • namespace support • web admin interface • web API • local and central deployments • Python + Django definition for ATLAS_CRITICAL Components overview 6/13
SAM Architecture • NCG (Nagios Configuration Generator) • reads from ATP and POEM via API • generates Nagios configuration to • set up which metrics to run • in which services for which sites • configures metric attributes • test parameters (SE path, CE queue…) • Nagios execution flags (Passive check, obsess…) • specifies metrics to import from other nodes Components overview 7/13
SAM Architecture • Nagios and probes • patched and packaged • probes encapsulate tests, which are run periodically • probes are provided by different parties • SAM supports only the SAM probe • Product Teams provide their own probes • imports test results from other Nagios instances • special probe distributes metric results • send_to_db • send_to_msg Components overview 8/13
SAM Architecture • MRS (Metric Results Store) • aggregates Nagios results • stores all metric results • summarizes service status from metric results • keeps track of status changes • per metric and service • per service and profile • keeps track of missing and removed metrics • bootstraps from POEM every hour • which metrics are to be expected for each service and profile? • local and central deployments • MySQL and Oracle Components overview 9/13
SAM Architecture • ACE (Availability Computation Engine) • summarizes MRS statuses • translates status changes into status evolution • hourly, daily, weekly and monthly granularities • service, flavour and site level aggregations • generates availability values using a profile algorithm • uses logic operations on status values • e.g.: (ARC-CE + CE) * SRMv2 * BDII • takes downtime into account to generate reliability values • runs every hour • Python + Oracle SQL Components overview 10/13
SAM Architecture • MyWLCG • Visualization tool for SAM data • metric results • service, flavour and site status, availability and reliability • Reads from ATP, POEM, MRS and ACE via database • Other applications • availability trends, experiment usage, topology view… • Exposes SAM results via web API • Report generation • Python + Django Components overview 11/13
SAM Architecture • Messaging clients • multiple, heterogeneous clients • send_to_msg, consume_to_db • msg_to_handler, recv_from_queue • wnjob • atp_synchro • transports metric data from one instance to another • integrates third party monitoring systems • MEG (Message Groove) • common messaging client framework • Python + stompclt Components overview 12/13
SAM Architecture • Summary • ATP provides topology • POEM defines profiles • NCG configures Nagios • Nagios runs the probes • Messaging transports results • MRS aggregates metric results into status • ACE aggregates status into availability • MyWLCG displays and exposes data Summary 13/13