380 likes | 1k Views
Monitoring Openstack – The Relationship Between Nagios and Ceilometer. Konstantin Benz, Researcher @ Zurich University of Applied Sciences. benn@zhaw.ch. Introduction & Agenda. About me Working as researcher @ Zurich University of Applied Sciences OpenStack / Cloud Computing
E N D
Monitoring Openstack – The Relationship Between Nagios and Ceilometer Konstantin Benz, Researcher @ Zurich University of Applied Sciences benn@zhaw.ch
Introduction & Agenda Aboutme Working asresearcher @ Zurich University ofApplied Sciences OpenStack / Cloud Computing Engaged in monitoringand High Availabilitysystems Currentlyworking on a Europe-widecloudfederation: XIFI – eXtensible Infrastructure for Future Internet http://www.fi-xifi.eu 17 nodes/ OpenStack clouds Test environmentfor Future Internet (FI-WARE) applications Infrastructure for smart cities, publichealthcare, trafficmanagement… European-wide L2-connected backbonenetwork Nagiosasmainmonitoringtoolofthatproject
Introduction & Agenda What are you talking about in this presentation? How to use Nagios to monitor an OpenStack cloud environment Integrate Nagios with OpenStack Anything else? Cloud monitoring requirements OpenStack cloud management software and Ceilometer Comparison between Nagios and Ceilometer: Technological paradigms Commonalities and differences How to integrate Nagios with Ceilometer Can't wait!
Cloud Monitoring Requirements Cloud ≈virtualization + elasticity Typesofclouds: IaaS: virtual VMs andnetworkdevices, elasticity in number/sizeofdevices PaaS: virtual, elasticallysizedplatform SaaS: softwareprovidedbyemployingvirtual, elasticresources Cloud is a collectionofvirtualresourcesprovided in physicalinfrastructure Cloud providesresourceselastically
Cloud Monitoring Requirements Why should someone use clouds? Cloud consumer can outsource IT infrastructure No fixed costs for cloud consumer Pay for resource utilization Cloud provider responsible for building and maintaining physical infrastructure Cloud provider can rent out unused IT infrastructure Eliminate waste Get money back for overcapacity
Monitoring OpenStack OpenStack Architecture Open sourcecloudcomputingsoftware Consists in multiple services: Keystone:OpenStackidentityservices (authentication, authorization, accounting) Cinder:managementof block storagevolumes Nova:managementandprovisionofvirtualresources (VM instances) Glance: managementof VM images Swift: managementofobjectstorage Neutron:managementofnetworkresources (IPs, routing, connectivity) Horizon: GUI dashboardfor end users Heat: orchestrationofvirtualizedenvironments (importantforprovidingelasticity) Ceilometer: monitoringofvirtualresources
Monitoring OpenStack Things tomonitor Operation ofOpenStackitself: Services: Cinder, Glance, Nova, Swift ... Infrastructure: Hardware, Operating System whereOpenStackservicesarerunning Operation ofvirtualresourcesprovidedbyOpenStack: Resourceavailability: VMs, virtualnetworkdevices Resourceutilization: VM uptime, CPU / memoryusage → Virtual resourcesarecommonlymonitoredbyCeilometer → Ceilometergathersdatathroughthe API ofOpenStackservices
Monitoring OpenStack WhyisCeilometer not enough? → Ceilometermonitorsvirtualresourcesthrough APIs of OpenStack components, BUT NOT operationofthe OpenStack components
Comparison Nagios / Ceilometer Nagios operational model Configuration: Check interval (andretryinterval) topollsystemstatusand update frontend GUI Remote executionofmonitoringclients (usuallyNagiosplugins) Thresholdsthatresult in "Okay", "Warning", "Critical" statusmessageswhicharesent back toNagiosserver (and "Unknown" ifstatus not measurable) Main usage: Effectivemonitoringsolutionforphysicalservers System administrationconsolethatallowsfor fast reaction in caseofproblems Strength: extensibilityandcustomizability Nagios must beextended in ordertomonitorvirtualresourcesinsideadministratedsystems
Comparison Nagios / Ceilometer Ceilometer operational model Configuration: Pollingservices check metrics OpenStack objectsgenerateeventnotificationsautomatically All eventsandmetricscollected in a database Main usage: OpenStack integratedmetricscollectorand database Temporal database thatcanbeusedforrating, chargingandbillingofvirtualresourceutilization Strength: fullyintegrated in OpenStack, collectingmostimportantmetricsandstoringtheirchangehistory Weakness: Does not monitorphysicalhosts
Nagios / OpenStack Integration Alternative 1: CeilometerPlugin in Nagios UseNagiosserverasfrontendforCeilometer: NagiospluginthatqueriesCeilometer database Virtual resourceutilizationdatacollectedbyCeilometer Nagiosserverresponsibleformonitoring non-virtualresources Benefits: Simple and easy toimplement No extra NagiospluginsrequiredtomonitorvirtualdevicesthataremanagedwithinOpenStack Ceilometertoolcanbeleftunchanged Drawbacks: Monitoring dataisstored at 2 different places: Nagios flat fileandCeilometer database
Nagios / OpenStack Integration Alternative 1: CeilometerPlugin in Nagios Implementation: Nagiosplugin on clientwhichhoststheCeilometer API (code sample below) Initializationwithdefaultvalues, OpenStackauthentication: #!/bin/bash #initializationwithdefaultvalues SERVICE='cpu_util' THRESHOLD='50.0' CRITICAL_THRESHOLD='80.0' #getopenstacktokentoaccessceilometer-api export OS_USERNAME="youruser" export OS_TENANT_NAME="yourtenant" export OS_PASSWORD="yourpassword" export OS_AUTH_URL=http://yourkeystoneurl:35357/v2.0/
Nagios / OpenStack Integration Alternative 1: CeilometerPlugin in Nagios The pluginshouldreceiveparamatersfor: Resourcetobemonitored (VM) Service (Ceilometermetric) Warningthreshold Critical threshold whilegetopts ":hs:t:T:" opt do case $opt in h ) printusage;; r ) RESOURCE=${OPTARG};; s ) SERVICE=${OPTARG};; t ) THRESHOLD=${OPTARG};; T ) CRITICAL_THRESHOLD=${OPTARG};; ? ) printusage;; esac done
Nagios / OpenStack Integration Alternative 1: CeilometerPlugin in Nagios Query Nova API togetresourcetomonitor (VM tobemonitored): RESOURCE=$(novalist | grep $RESOURCE | tail -2 | head -1 | awk -F '|' '{print $2; end}') RESOURCE=$(echo $RESOURCE) Query metric on thatresource, multiple entriespossiblerequires an iterator): ITERATOR=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk 'END{print NR; end}') Initialize withreturncode 0 (nowarningorerror): RETURNCODE=0
Nagios / OpenStack Integration Alternative 1: CeilometerPlugin in Nagios Iteratethroughmetric: for (( C=1; C<=$ITERATOR; C++ )) do METER_NAME=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $2 $1; end}}') METER_UNIT=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $4 $1; end}}') RESOURCE_ID=$(ceilometer meter-list -q "resource_id=$RESOURCE" | grep -w $SERVICE | awk -F '|' -v var="$C" '{if (NR == var) {print $5 $1; end}}') ACTUAL_VALUE=$(ceilometer sample-list -m $METER_NAME -q "resource_id=$RESOURCE" -l 1 | grep $RESOURCE_ID | head -4 | tail -1| awk -F '|' '{print $5; end}')
Nagios / OpenStack Integration Alternative 1: CeilometerPlugin in Nagios Update returncodeifvalueofonemetricisabove a threshold: if [ $(echo "$ACTUAL_VALUE > $THRESHOLD" | bc) -eq 1 ] then if (( "$RETURNCODE" < "1" )) then RETURNCODE=1 fi if [ $(echo "$ACTUAL_VALUE > $CRITICAL_THRESHOLD" | bc) -eq 1 ] then if (( "$RETURNCODE" < "2" )) then RETURNCODE=2
Nagios / OpenStack Integration Alternative 1: CeilometerPlugin in Nagios Output returncode: STATUS=$(echo "$METER_NAME on $RESOURCE_ID is: $ACTUAL_VALUE $METER_UNIT") echo $STATUS done echo $RETURNCODE
Nagios / OpenStack Integration Alternative 1: CeilometerPlugin in Nagios PlugincanbedownloadedfromGithub: https://github.com/kobe6661/nagios_ceilometer_plugin.git Additionally: NRPE-Plugin: remote executionofNagioscallstoCeilometer Install NRPE on Nagios Core serverandserverthathostsCeilometer API Change nrpe.cfgtoincludecallto VM metric
Nagios / OpenStack Integration Alternative 1: Implementation OpenStackinstalled on 3 nodes: Management node: responsibleformonitoringotherOpenStacknodes Controller node: responsibleformanagementandconfigurationofcloudresources (VMs, network) Computenode: provisionsvirtualresources
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Nagiosas a tooltomonitorOpenStackservicesand VMs: PluginstomonitorhealthofOpenStackservices As soonasnew VMs arecreated, Nagiosshouldmonitorthem RequireselasticreconfigurationofNagios Benefits: Nodataduplication, NagiosistheonlymonitoringtoolrequiredtomonitorOpenStack Drawbacks: Elasticreconfiguration RathercomplexNagiosconfiguration
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Problem: Dynamic provisioningofresources (Virtual Machines) Dynamic configurationofhosts in Nagios Server required
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Problem: Whathappensif VM isterminatedby end user? Nagiosassumes a host failureandproduces a criticalwarning
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Solution: Nova-API triggersreconfigurationofNagiosif VMs arecreatedorterminated
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Anotherproblem: VMs must haveNagiospluginsinstalledwhentheyarecreated Solution: Useonly VM Images thatcontainNagiospluginsfor VM creation OR Usepackagemanagementtools like Puppet, Chef…
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Trigger fordynamicNagiosconfiguration: Find availableresources via nova-api (requiresnameof host and IP address) #!/bin/bash NUMLINES=$(novalist | wc -l) NUMLINES=$[$NUMLINES-3] for(( C=1; C<=$ITERATOR; C++ )) do VM_NAME=$(novalist | tail -$NUMLINES | awk -F'|' -v var="$I" '{if (NR==var){print $3 $1;end}}') IP_ADDRESS=$(novalist | tail -$NUMLINES | awk -F'|' -v var="$I" '{if (NR==var){print $7 $1;end}}' | sed 's/[a-zA-Z0-9]*[=|-]//g')
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Trigger fordynamicNagiosconfiguration: Create a configfileincluding VM nameand IP addressfrom a template (e. g. vm_template.cfg) CONFIG_FILE=$(echo $VM_NAME).cfg sed "s/<vm_name>/$VM_NAME/g" vm_template.cfg>named_template.cfg sed "s/<ip_address>/$IP_ADDRESS/g" named_template.cfg>$CONFIG_FILE Set NagiosasownerofthefileandmovefiletoNagiosconfigurationdirectory chownnagios.nagios $CONFIG_FILE chmod 644 $CONFIG_FILE mv $CONFIG_FILE /usr/local/nagios/etc/objects/$CONFIG_FILE
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Trigger fordynamicNagiosconfiguration: Add configfiletonagios.cfg echo "cfg_file=/usr/local/nagios/etc/objects/$CONFIG_FILE" >> /usr/local/nagios/etc/nagios.cfg Restartnagios servicenagiosrestart
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins WhyrestartNagios? Nagios must knowthat a new VM ispresentorthat an old VM hasbeenterminated ReconfigureandrestartNagios (!)
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Trigger fordynamicNagiosconfiguration: Add triggerto Nova-API: Nagios Event Broker module: Check_MK:http://mathias-kettner.de/checkmk_livestatus.html ReconfigureNagiosdynamically: Edit nagios.cfgandrestartNagios– badidea (!!) in a cloudenvironment Autoconfigurationtools: NagioSQL: http://www.nagiosql.org/documentation.html
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Whatotherways do existtodynamicallyreconfigureNagios? Puppet masterthattriggers: VMs toinstallNagios NRPE pluginsand Nagios Server to update itsconfiguration Same canbedonewith Chef, Ansible… Drawback: Puppet scalabilityif 1‘000s ofservershavetobe (de-)commisioneddynamically
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Whatotherways do existtodynamicallyreconfigureNagios? Python fabricwithCuisinetotrigger: VMs toinstallNagios NRPE pluginsand Nagios Server to update itsconfiguration Getlistof VMs fromnovaclient.clientimportClient nova= Client(VERSION, USERNAME, PASSWORD, PROJECT_ID, AUTH_URL) servers = nova.servers.list() Write VM listtofile file = open('servers'‚ 'w') file.write(servers)
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Whatotherways do existtodynamicallyreconfigureNagios? Python fabricwithCuisinetotrigger: VMs toinstallNagios NRPE pluginsand Nagios Server to update itsconfiguration Create fabfile.py anddefinewhichserversshouldbeconfigured fromfabric.apiimport * from . importvm_recipe, nagios_recipe env.use_ssh_config = True servers=open('servers‘) serverlist=[str(line) forline in servers] env.roledefs= {‘vm': serverlist, ‘nagios_server': xx.xx.xx.xx }
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Assignrecipes @roles(„vm") defconfigure_vm(): vm_recipe.ensure() @roles(„nagios") defconfigure_nagios(): nagios_recipe.ensure()
Nagios / OpenStack Integration Alternative 2: NagiosOpenStackPlugins Create vm_recipe.py and nagios_recipe.py fromfabric.apiimport * importcuisine defensure(): ifnot is_installed(): puts("Installing NRPE...") install() else: puts(„NRPE alreadyinstalled") definstall_prerequisites(): cuisine.package_ensure(„nrpe")
Choice of Alternatives Whichoptionshouldwechoose? Implementation advantagesanddrawbacks
Conclusion Whatdidyoutalkabout? HowtouseNagiostomonitor an OpenStack cloudenvironment Cloud monitoringrequirements: Elasticity, dynamicprovisioningofvirtualmachines OpenStack monitoringtoolsNagiosandCeilometer Nagiosas extensible monitoringsystem Ceilometercapturesdatathrough Nova-API Nagios/OpenStack integration Alternative 1: Ceilometermonitors VMs withNagiosasgraphicalfrontend Alternative 2: Nagiosmonitors VMs andisautomaticallyreconfigured DiscoveredneedfordynamicreloadingofNagiosconfiguration Discussedadvantages/drawbacksof different implementations
Questions? Any questions? Thanks!
The End Konstantin Benz benn@zhaw.ch