170 likes | 346 Views
Towards Decentralized Network Management and Reliability. Lynn Jones & Tamitha Carpenter Stottler Henke Associates, Seattle, WA Security & Management (SAM’02) • Las Vegas • 6/25/02. MASRR . M ulti- A gent S ystem for network R esource R eliability
E N D
Towards Decentralized Network Management and Reliability Lynn Jones & Tamitha Carpenter Stottler Henke Associates, Seattle, WA Security & Management (SAM’02) • Las Vegas • 6/25/02
MASRR • Multi-Agent System for network Resource Reliability • Defense Advanced Research Projects Agency (DARPA) contract DAAH01-00-C-R220.
About SHAI • Stottler Henke Associates, Inc. • Artificial Intelligence research, development and consulting. • Founded 1988. • Extensive experience • Hundreds of fielded systems. • Variety of AI techniques and application areas.
MASRR Goals • Detection of events not previously seen. • Adaptation to changing usage characteristics. • Operation in heterogeneous environments. • Real-time performance. • Scalability in deployment and operation. • Autonomous / semi-autonomous operation. • Robustness.
Related commercial work • Aprisma’s SPECTRUM suite • event correlation and model-based reasoning • SNMP MIB and other data • SRI’s Emerald (eBayes) • hybrid signature-based / anomaly detection monitoring • tcpDump data and derived events
Related research • Cabrera, et.al. • look for differences in behavior of selected “key variables” • INBOUNDS • statistical modeling using “abnormality factors” and “standardization factors” • Eskin, et. Al. • Automatic outlier partitioning and learned model replacement
Inter-agent communication • Agents use monitor data and messages from each other to form and share “impressions” of network health. • Information fusion reduces redundant messages, alarms; increases certainty. • Absence of response is viewed as information; does not hinder agent reasoning.
Shared memory localhost Actions network Messages admin Under the hood: World Model - “thumbprint” - action selection - history - learning Mailroom - action translation - command execution - results preparation - receive unsolicited msgs change detection receive polling data
Thumbprint anomaly detection • Characterize “normal” or expected behavior. • Learned in-place, using observed data. • Employ SHAI’s ChAD data mining for detecting departures from normal. • Changes may represent anomalies.
Case-based reasoning • Evaluation of thumbprints (and deviations), messages, and other observations form indices that key action templates. • Actions contain strategies for execution.
History • Embodies “line of reasoning” • Multiple, concurrent diagnoses. • Reconciliation of local views. • Provides record • Learning best actions, case adaptation, and predicting outcomes. • Degradation avoidance and early response.
Why MASRR will work • Thumbprints closely fit network behavior at specific points and times. • Models tolerate routine fluctuations in network usage. • Each agent is sensitive to small changes (slow changes, changes across few variables) on the element(s) it monitors.
Why MASRR will work • Correlation of, reasoning about small changes yields higher accuracy (than centralized monitoring and analysis). • Absence of response or data is also used as information. • Combines general anomaly detection with root cause analysis.
Why MASRR will work • More general than eBayes: can detect various kinds of anomalies across different variables. • “Key variable” signatures not required as in Cabrera, et. al. (similar rules might be used for fault/attack identification). • Decentralized analysis more sensitive than INBOUNDS’ centralized system.
Known issues • Overhead - processing, disk space. • Getting the sensitivity parameters right. • Are parameters universal? Or do they depend on the data? • Amount of data needed. • What about pre-existing conditions? • Feature selection.
Contact info Lynn Jones lwjones@shai-seattle.com http://64.81.14.30/ReliabilityWeb/ SHAI 1107 NE 45th St. Suite 427 Seattle, WA 98105