170 likes | 190 Views
This document outlines the workflow steps for achieving a "production-like" GENI campus, including experimenter support, resource deployment, and monitoring. It provides guidance and support for campuses deploying similar resources.
E N D
GENICampus Ops Workflow Chaos Golubitsky San Juan, Puerto Rico Mar 16 2011 www.geni.net
Outline • Introduction • Experimenter support • Resources • Monitoring
Towards a more “production-like” GENI • Some Spiral 3 ops goals: • Resources are easier for experimenters to find and use • Provisioning an experiment doesn’t require picking up the phone (as often) • Resources are more reliably available • Problems with resources are easier to detect and resolve • Here are some steps we think will be useful
Campus ops workflow? • A workflow is a set of steps to achieve a goal: • Become a production GENI campus! • This process will change as more campuses try it • Proposed workflow steps we think will be useful • Three categories: • Experimenter support • Resource deployment • Monitoring • There’s more than one way to do this; input is welcome!
GPO as reference campus • We try things out, test, and provide guidance and support to campuses deploying similar things • And pass along ideas for other reference campuses • We hope to help: • Small testbeds with diverse resources (OpenFlow, MyPLC, ProtoGENI, L2 backbone connectivity) • Campuses who want to create testbeds • Bigger testbeds (where we can) • We’re working on: • Experimenter support • More (and more GENI-like) resources • Useful monitoring • Templates for transitioning to GENI operations
Workflow Steps for Experimenter Support • Subscribe to response-team@geni.net: http://lists.geni.net/mailman/listinfo/response-team • Report your outages • Answer questions from experimenters • Tell GPO (gpo-infra@geni.net) you’re willing to support some experimenters:http://groups.geni.net/geni/wiki/ProductionResources • Create a page advertising each of your aggregates:http://groups.geni.net/geni/wiki/GeniAggregate/YourSiteAggregate • What resources do you have? • Who can use them? • How do they use them? • Resources don’t need to be fully open to the public to be advertised here • Template: http://groups.geni.net/geni/wiki/TemplateAggregatePage
Experimenter Support at GPO http://groups.geni.net/geni/wiki/GeniAggregate/GpoLabProtoGeni
Workflow Steps for Adding Resources • Connectivity • Aggregates: • Give local users access to your resources • Run software that supports the GENI AM API • Give remote users access to your resources (consistent with your site policy) • Configuration management: • Know what you’re running • Especially if it’s GENI software (things change fast) • Allows you to help experimenters better • Allows us (and other campuses) to help you better
Resources at GPO • GPO can provide templates and help for aggregates we have experience with • Things we have: • Connections to NLR and I2 backbones • OpenFlow switches (HP/NEC/Quanta), FlowVisors, controllers, GENI AM API support • Reference installation of WiMAX software • ProtoGENI cluster • A simple resource you can deploy: • MyPLC plus SFA to support the GENI AM API: http://groups.geni.net/geni/wiki/GpoLab/MyplcReferenceImplementation
Workflow Steps for Monitoring (1) • Two consumers of monitoring data: • Operators and experimenters • Operators: • Goals: • Detect and resolve outages quickly • Plan for the future • Monitoring steps: • Polling and trending of local resources • Alerting on local resource outages • Visibility into status of connected remote resources • Visibility into many remote resources in a consistent format
Workflow Steps for Monitoring (2) • Experimenters: • Goals: • Identify problems affecting the slice • Collect measurement data for their slice • Monitoring steps: • Status of available resources (how many nodes?) • Status of resources I’m using (is my node up?) • External characteristics of slice (CPU usage? Network bandwidth?) • Internal characteristics of slice (I&M working session Thursday)
Monitoring at GPO • Strategy: • Collect as much data as possible from our site now: http://monitor.gpolab.bbn.com • Integrate our data with collectors (GMOC, aggregates) • Tactics: • Trending is more important than alerting: • Remote operators and experimenters are casual consumers • Don’t want alerts for resources which may not be relevant • Do want historical availability information on request • Collect numeric trending data in a consistent format: • Using ganglia to collect data in rrdtool format for now • Generate webpages that format ganglia’s data more meaningfully
Monitoring at GPO: Collecting GENI Data • Active testing: • Use simple scripts to run tests and report results to ganglia • Test recent values for freshness and sanity • GPO uses this to monitor reachability across the NLR and Internet2 OpenFlow backbone • Collecting external slice data: • Run locally on aggregate manager • Query aggregate data: slice names, node counts • Query operational data: packet counters, node state, CPU usage
Summary • Spiral 3 ops goals: • Test operations across several unaffiliated campuses • Ramp up GENI-wide experiment support • GPO is trying to be an example campus, but there are many others • If you do only two things, please: • Join response-team@geni.net • Make sure we (gpo-infra@geni.net) know what you would like to support this year, and what we can do to help