270 likes | 279 Views
Explore the coexistence of Event Correlation and Life Cycle Management in the NFV world, from network monitoring to service assurance, challenges, opportunities, and future trends. Learn how these concepts intertwine for efficient operations and improved incident resolution.
E N D
Event Correlation & Life Cycle Management How will they coexist in the NFV world? May 10, 2017 Dale Sorsby Bill Coward Michael Evenchick
Introduction to NFV • Network Function Virtualization as defined by Wikipedia is: • "a network architecture concept that uses the technologies of IT virtualization to virtualize entire classes of network node functions into building blocks that may connect, or chain together, to create communication services." • It is not a Virtual Network Function (VNF) but the environment in which VNFs exist.
Monitoring – What? • Cloud Infrastructure (Compute, Network, Storage) • Management Infrastructure (VIM, VNFM, NFVO) • Virtual resources (cpu, mem, disk) • Virtual Instances (VNFs) • Virtual Services (Full Service chain through cloud) • Overlay networks (Tunnel to Cloud, Cloud Networks) • Underlay networks (Interfaces and connectivity of devices) • Physical network devices (Routers, Switches, etc)
Network Monitoring Cross Domain Orchestrator Visualization Correlation SDWAN Controller Network Orchestrator Collection Central Data Center Regional Data Center VIM NFV Orchestrator SDNC OpenStack - VIM VNF Manager Provider Core Branch/ Campus SDN Controller Servers Managed Access VNF Internet Distributed Cloud SDWAN Centralized Cloud
Infrastructure Monitoring Cross Domain Orchestrator Visualization Correlation SDWAN Controller Network Orchestrator Collection Central Data Center Regional Data Center VIM NFV Orchestrator SDNC OpenStack - VIM VNF Manager Provider Core Branch/ Campus SDN Controller Managed Access Internet Distributed Cloud SDWAN Centralized Cloud
Application Monitoring Cross Domain Orchestrator Visualization Correlation SDWAN Controller Network Orchestrator Collection Central Data Center Regional Data Center VIM NFV Orchestrator SDNC OpenStack - VIM VNF Manager Provider Core Branch/ Campus SDN Controller Managed Access Internet Distributed Cloud SDWAN Centralized Cloud
Service Monitoring – Coordinated Effort Cross Domain Orchestrator Visualization Correlation SDWAN Controller Network Orchestrator Collection Central Data Center Regional Data Center VIM NFV Orchestrator SDNC OpenStack - VIM VNF Manager Provider Core Branch/Campus SDN Controller Internet (IPsec) Servers Managed Access VNF Internet Distributed Cloud SDWAN Centralized Cloud
Operationalizing NFV • Don't reinvent the wheel • Use built in intelligence • Utilize strength of existing systems • Brownfield • Integrate with existing assurance systems • Don't introduce new applications/views to operators
Service Assurance - Collection and Challenges • Integration • How do you know everything has been gathered? • Incomplete alerts/notifications • Reliable • Comprehensive • New approach for monitoring NFV environments – things change
Event Correlation - Definition • Event Correlation: an automated process of understanding and revealing relationships between complex system events. • Requires Holistic Awareness Telemetry/Intelligence • Integrate Multiple Data Sources & Types, Protocols • Scale, 1000’s Physical & Logic Resource Elements • Event Relationships can be Topology, Temporal or Service … • Data, Information, Knowledge, Wisdom
Event Correlation - Challenges • Challenge, make sense of events/or lack of, so they are actionable. • Internal OpenStack telemetry may have scalability & sizing challenges, Ceilometer, RabbitMQ, Nagios, Heka… • Integration/Access to multiple systems & Data Types/Formats • Healing Collisions, VNFO, NFVM, Heat … must Yield to CDO • Open Source Event Correlation Tools, Simple Event Correlator (SEC), Drools, RiverMuse… SP/CG ?
Event Correlation - Opportunity • Closed Loop Monitoring, Alerting, and Healing • Event Aggregation, Suppression, Prioritization, Routing, Enrichment • Historical Knowledge Support & Identify Service Impact / Ripple • Effective event correlation supports decision making knowledge and Automation/LCM • Reduced Operation Expense, Improved Incident Inter-Department Coordination • Improve Event-to-Incident Resolution Process/Time • Smarter not Harder
Event Correlation – Future • ONAP, Data Collection, Analytics, and Events (DCAE) • OPNFV, VNF Event Streaming (VES) Project, common data model • Open Source, Vitrage, Monasca, Zabbix • Assisted/Machine Learning & A.I. Capabilities • Need for comprehensive Framework for life cycle management
Life Cycle Management, Where? And When? • It is great to automate! So... • Everything wants to perform Life Cycle Management • VIM: "OpenStack Heat" will perform life cycle management when it detects issues with the items that it Orchestrated. • SDN Controller (e.g. Contrail) can perform life cycle management under some condition similar to Heat when it owns the Service Instances. • VNFM will perform life cycle management of the VNF • NFVO will perform life cycle management of the NFV service through the cloud • Cross Domain Orchestrator(CDO) can also perform life cycle management of NFV services and possibly the underlay network • Result is confusion and overlap • So which system is really responsible for what? • And where is the correlation?
Trouble Scenario – VNF Fails • Heat determines the failure and attempts to restore the VNF • SDN Controller determines the failure and attempts to restore the VNF • VNFM determines the failure and attempts to restore the VNF • NFVO determine the service outage and attempts to restore the service • The Cross Domain Orchestrator determines the service outage and attempts to restore the service. • Will all 5 try to heal? Probably not but experiences have shown multiple elements heal and the result is failed healing and systems being out of sync. • Heat determines the failure and instantiates a new VNF • The new VNF claims ports on the network • The NFVO determines the failure and attempts to destroy the Heat stack • This fails due to not being able to free all the ports on the network • Unwanted Result: Systems are now out of sync and the status of the service is in question. If brought up by Heat, any Customer specific configuration placed on the VNF by the VNFM will be missing.
Trouble Scenario – VNF Management Network Failure • VNFM determines VNF failure and attempts to heal the VNF • NFVO determines service issues and attempts to heal the service. • Will these occur? It will depend on what and how the VNF is being monitored but experience shows this happens. If either one of these happens, the service which could actually have been fine, will be impacted. • Unwanted Result: a 5 minute outage caused entirely by the systems put in place to minimize the customer impact.
Trouble Scenario – Network Outage • Detection of traffic flow issue • VNFM determines VNF failure and attempts to heal the VNF • NFVO determines service issues and attempts to heal the service. • Cross Domain Orchestrator determines there is an issue with the service and attempts to heal the service? • Will these occur? It will depend on what and how the VNF is being monitored but experience shows this happens. If either one of these happens, the service which could actually have been fine, will be impacted. • Unwanted Result: a 5 minute outage caused entirely by the systems put in place to minimize the customer impact. While in reality the customer may have seen a limited outage 10-30 seconds
Cross Domain Orchestration • It is immature and in its infancy but could be part of the answer. • It will need to insure all of the necessary events are collected • It will need to tightly integrate with correlation • It will need to tightly integrate with all Domain Orchestrators to have an end to end view of the service • It will need to permit each Domain Orchestrator to control their domain
Life Cycle Management & Existing Correlation • Lessons Learned • Work with Operations and the system they use - The best tools are only the best because they get used! • Systems working with other systems is the only way to achieve success with end to end life cycle management. So plan to integrate with other systems from the beginning not as an after thought! • Existing systems will need to learn and handle the new world or they will eventually have to be replaced! • "Stay in your swim lane but understand the world is bigger than you!"