GENI I&M and Monitoring GENI Engineering Conference 14 Boston, MA

GENI I&M and MonitoringGENI Engineering Conference 14Boston, MA Sarah Edwards Chaos Golubitsky Jeanne Ohren July 9, 2012 www.geni.net

Introduction • Useful data lives everywhere in GENI • Relationships: slices, slivers, users, resources • Counters: interface traffic, OpenFlowflowspace rules • Measurements: CPU load, memory, bandwidth, latency • Health status: reachability, API functionality • We can use this information to… • Troubleshoot issues • Optimize configurations • Help experimenters understand their slice resources • Help experimenters analyze their experiments • How do we help each other bring it all together?

Agenda • Introduction • Sarah Edwards, GPO • Guest Speakers: • Kevin Bohan, GMOC • GMOC Monitoring Demonstration • AnirbanMandal, RENCI • Client Authentication & Authorization for GENI XMPP Messaging Service • Martin Swany, Indiana University • GEMINI: Active Network Measurement • Prasad Calyam, OSC • Measurements on Layer 2 and OpenFlow Paths • Bringing It All Together • Jeanne Ohren, GPO • Discussion

GMOC Monitoring Demonstration • Kevin Bohan, GRNOC

Client Authentication & Authorization for GENI XMPP Messaging Service • AnirbanMandal, RENCI

GEMINI: Active Network Measurement • Martin Swany, Indiana University

Measurements on Layer 2 and OpenFlow Paths • Prasad Calyam, OSC

Bringing It All Together • Jeanne Ohren, GPO

Bringing It All Together • Useful data lives everywhere in GENI • Relationships: slices, slivers, users, resources • Counters: interface traffic, OpenFlowflowspace rules • Measurements: CPU load, memory, bandwidth, latency • Health status: reachability, API functionality • We can use this information to… • Troubleshoot issues • Optimize configurations • Help experimenters understand their slice resources • Help experimenters analyze their experiments • How do we help each other bring it all together?

Bringing It All Together • Let’s discuss a couple of examples of issues to consider when working on projects • Data Naming • Data Transport • Let’s walk through some of the types of data that are being collected or are planned to be collected soon

Data Naming ExampleScenario 1 • Scenario 1 – Consistent naming of resources and devices • Resources on two aggregates are sharing a network link, each referencing an endpoint. • Each aggregate names their endpoint when submitting data about the link. • The names must be consistent in order for the consumer to be able to relate the data from both endpoints. Aggregate A Aggregate B

Data Naming ExampleScenario 2 • Scenario 2 – Globally unique and consistent naming • Two aggregates are reporting data on their active slivers including to which slice the sliver belongs. • Aggregate A reports a sliver on the slice by URN (e.g. urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+joslice) • Aggregate B reports a sliver on the slice by UUID (e.g. 550e8400-e29b-41d4-a716-446655440000 ) • The experimenter who created the slice may report I&M data on that slice by slice name (e.g. joslice). Slice: urn:publicid:IDN+pgeni.gpolab.bbn.com+slice+joslice 550e8400-e29b-41d4-a716-446655440000 Sliver B Sliver A

Data Naming ExampleScenario 2 (cont’d) • Scenario 2 – Globally unique and consistent naming • The consumer of the data may need to determine if these two slivers belong to the same slice. • Without consistent naming and namespaces, the consumer of the data has to figure out if and how the two slivers and the experiment data relate. • This is already being addressed by GENI AM API v3 by using the combination of URN and UUID. Monitoring and some I&M projects are adopting the same slice naming. • URN + UUID provides uniqueness over time and space. • How does this affect other projects? • What are some other examples?

Data Transport ExampleScenario • Scenario • As an aggregate, I collect data about the slivers that I manage, the resources assigned to those slivers, the resources that I have available, etcand report that data to GMOC. • As an experimenter, I am interested in what resources are available at each of the aggregates. • As an operator, I am interested in statistics on the slivers that have been created/deleted over a period of time.

Data Transport ExampleHow data is accessed today • How do each of these consumers access this data? • Aggregates (ExoGENI, InstaGENI, MyPLC) • Push data to GMOC at regular intervals using the GMOC APIs • Currently access control is using non-GENI credentials • GENI Clearinghouse (Future) • Will provide an API to pull data on slices, users, and projects. • IMF and others • Provides a pub/sub interface to allow interested parties with the appropriate credentials to subscribe to data events • I&M (GEMINI, GIMI, INSTOOLS) • Provide the ability for the user to push data to an archive on iRODS with metadata. • iRODS account holders can control and track who has access to archived data

Data Transport ExampleAccess Control and Reliability • Access control • How do we ensure that the appropriate people are able to access the data? • How do we ensure that the wrong people do not get to the data? • How do we keep the access control from getting too complicated for the users? • Reliability • How do we ensure the data makes it to the other end uncorrupted? • How do we ensure that the data is getting recorded correctly? • How can we all walk away from the table with access to good, reliable data?

Data Sources • Relational data collected by GMOC • Time-series data collected by GMOC • Active network measurement data collected by I&M tools • Passive host measurement data collected by I&M tools • Measurement Data Object Descriptor • Other independent monitoring tools

Data Sources • Relational data collected by GMOC • Physical location of aggregate resources • Points of Contact (POC) for each aggregate • Slice Authority Info • type, version, operating organization, etc. • Aggregate Info • name, version, type, etc. • Slivers for each aggregate • Sliver data • who created them, when they were created, current state, etc. • Data about resources within each aggregate • VM servers, routers, etc. • Mapping of resources to slivers • Data about interfaces on resources • MAC/IPv4/IPv6 addresses, VLAN tags, netmask, etc. • Schema: http://groups.geni.net/geni/attachment/wiki/GENIMetaOps/gmocv3.rng

Data Sources • Time-series data collected by GMOC • CPU utilization • Disk Utilization • per partition • Number of active VMs • for hypervisors • Interface traffic counters • TX/RX pps, TX/RX bps • OpenFlow stats (per datapath and per sliver) • ports, RO/RW rules, TX/RX messages, breakdown of messages by type • Health checks • AM is accessible via AM API • Details: http://groups.geni.net/geni/wiki/GENIMetaOps/DraftMonitoringMetrics

Data Sources

Data Sources • GEMINI • Provides tools to collect active network measurements • Bandwidth, latency • Provides tools to collect passive network and host measurements • CPU utilization, memory usage, network traffic count • Data will be stored in measurement store service (coming soon) • Will provide pub/sub interface and support high-rate data transfers • Experiment topology and service data stored in UNIS service • Queryable history of topology changes • Data can be pushed to iRODS archive • Command line interface with access control • Web interface with access control • Searchable

Data Sources

Data Sources • GIMI • Provides tools to collect data from experiment nodes • bandwidth, delay jitter, datagram loss data • CPU load, memory usage, per-process state, system usage data • Collected on OML server • Data can be pushed to iRODS archive • Command line interface with access control • Web interface with access control • Searchable

Data Sources

Data Sources • Measurement Data Object Descriptor (MDOD) • Measurement data objects have associated metadata that provides information on the schema and provenance of the data • Would like to extend MDOD to cover all types of objects, i.e., software images • Would like to use MDOD schema to define Event Record schema • Plan to archive measurement data objects in an archive system based on iRODS • Facilitates searching and correlating data • I&M group has completed v1 of MDOD schema • Working towards a simpler v2

Data Sources • Other Independent Monitoring Data Sources • PlanetLab Monitoring - CoMon • http://comon.cs.princeton.edu • Provides monitoring statistics at both a node level and a slice level • Only covers regular PLC nodes • ProtoGENI Monitoring • Node Control Center: https://www.emulab.net/nodecontrol_list.php3?showtype=pcs • Shared Pool: https://www.emulab.net/showpool.php • Testbed Node Availability Stats: https://www.emulab.net/node_usage/ • Experiment Information Listing: https://www.emulab.net/showexp_list.php3?showtype=all&sortby=name&thumb=1 • Encourage new independent tools that provide monitoring or I&M info • more accessible and usable across all of GENI if people collaborate and use interfaces like those we are reviewing today

Discussion • Data Naming • How have lack of globally unique and consistent naming affected other projects? • What are some other data naming examples? • Data Transport • What are you using that others might find useful? • How can we all walk away from the table with access to good, reliable data? • What other data sharing issues have you encountered? • Data Resources • What other data resources should we all know about?

GENI I&M and Monitoring GENI Engineering Conference 14 Boston, MA