290 likes | 383 Views
Immune System Metaphors Applied to Intrusion Detection and Related Problems. by Ian Nunn, SCS, Carleton University inunn@digitaldoor.net. Overview of Presentation. Review of immune system properties of most interest Algorithm design and the representation of application domains
E N D
Immune System Metaphors Applied to Intrusion Detection and Related Problems by Ian Nunn, SCS, Carleton University inunn@digitaldoor.net
Overview of Presentation • Review of immune system properties of most interest • Algorithm design and the representation of application domains • Examples of two recognition algorithms • Overview of application areas • Focus on intrusion detection systems (IDS) • Advantages of IS models and future research • The IS model as a swarm system
Immune System Characteristics of Interest • The human immune system (IS) is a system of detectors (principally B and T cells) that: • After initial negative selection (tolerization), does not recognize elements of the body (self) • Is adaptable in that it can recognize over time, any foreign element (non-self) including those never before encountered • Remembers previous foreign element encounters • Dynamically regenerates its elements • Regulates the population size and diversity of its elements • Is robust to input signal noise (recognition region) and detector loss • Is distributed in nature with no central or hierarchical control • Is error tolerant in that self recognition does not halt the system • Is self-protecting since it is part of self
Recognition is based on matching of structural regions called epitopes on antigens and paratopes on antibodies • Shape space model: a parameterized representation (genotype) of the conformational form of self/non-self elements (phenotype) Representation of Self/Non-Self • IS elements involved are cellular proteins and their peptide sequences
IS Application Algorithm Design • Requires a deep understanding of the problem domain • Self/non-self discrimination the fundamental IS principle • Steps in designing an IS algorithm: • Identification of features allowing correct and complete self/non-self discrimination* • Representation or encoding of features, particularly of continuous real-valued parameters*. Ab and Ag feature strings of same length facilitate algorithm performance analysis • Determination of a matching or fitness function. Important for evolution of Ab populations (affinity maturation) • Selection of IS principles to apply, e.g. negative selection, costimulation, affinity maturation, etc. * This is hard stuff and an important step in applying any modeling technique whether genetic algorithms or swarm simulations (recall for army ants the problem of deciding what parameters to assign to the ants and to the environment and what values to allow).
Approach to Feature Selection and Representation • Antibodies and antigens represented by strings of features: • The set of actual values observed such as sensor readings, voltages, ASCII text is called the application’s phenotype • The coded representation is called the application’s genotype • A feature is encoded by symbols from a finite alphabet • Some application feature domains: • Binary variables: digital signals in computer systems • Discrete real variables: ASCII character text • Continuous real variables: real world sensors • Continuous domains must be mapped onto discrete domains since we work with finite alphabets to ensure finite Ab/Ag population spaces
Phenotype Representation: Change Detection Problem Domains • OS (UNIX) processes: sequences of top level system calls • Program execution: alphabet symbols represent op codes • File system: reduction to ASCII or binary strings • User behavior and interface use: keystrokes, mouse clicks • Time series data representation of a physical (plant) processes: x/y position of a milling machine tool • Memory accesses: memory address calls • Local network traffic: TCP/IP packets: addresses, ports • Network traffic through routers and gateways: TCP/IP packets, addresses, volume
Many matching (fitness) functions possible, e.g. for lia contiguous substring of l 1’s in the complementary match (Hamming distance) IS Phenotype Encoding and Matching Using a Binary Model1 • Genotype = Phenotype : 32 bit string on a binary alphabet
Example: Use of a Binary Model with a GA for Clonal Selection1 • Start with randomly generated Ag and possibly incomplete Ab populations • For each Ab in turn, compute its average match (fitness) with a random fixed-size Ag subpopulation • Use a standard GA with mutation but no crossover to evolve successively better generations of antibodies • Niches observed to develop in coverage space for genetic commonalities (bacterial polysaccaride coating) if the initial populations have a bias • Self recognition minimized (without negative selection) by selecting for more Ag specific instead of more general antibodies – less likely to match self
Establishing Antibody Fitness Random sub-population
GA Evolution of Antibodies1 ________________ 1011010111110111
Use of a Negative Selection Algorithm for Clonal Selection5,2 • Want explicit self-filtering (tolerization) • Algorithm: • Generate the set S of self (sub)strings • Generate a set R0of random strings • Match each string from R0 against S : • Match (non-complementary) on at least r contiguous locations: reject • No match: add string to detector set R • How to generate detectors efficiently an issue • Match detector set against target strings to detect intruder • Strings can be on any alphabet
Self string to be protected: 1011 0111 0011 0000, length of contiguous match substring r = 2 Negative Selection Algorithm2 Match Ab1: 10xx Match Ab4: xx00
Let s1 and s2 be two antibodies matching over r-1 contiguous bits and h1 and h2 be two antigens The Problem of Holes6 • For a particular choice of matching rule and Ab repertoire, some non-self strings may not be found causing a hole in the coverage space • A detector that matches any r contiguous bits in h1 will also match either s1 or s2 for the same feature string. The same for h2. So h1 and h2 are undetectable.
Major Application Categories of Immune System Theory • Machine learning and pattern recognition: limited but promising work done to date • Associative memories: limited work done to date • Elimination of identified elements: • IS model: use the B cell and Tk cell “kill” disable viruses • Use a phagocyte analogue for cleanup and garbage collection • IBM virus lab and Forrest’s group at UNM have looked at this • Recovery, repair and augmentation of identified elements : • IS model: use the B cell and Tk cell analogue to deliver a positive payload to an agent • Very little work done to date
Application Areas (cont.) • Detection problems – where most of the work has occurred: • Fault: failure of a self element (industrial plant systems) • Change: any change in self (tumors) • Anomaly: unusual presentation of a self element • Virus: presence of a non-self element • Intrusion: attempt to gain access by non-self element • Many of the classical issues of computer and network security involve some element of detection or self/non-self discrimination
The Intrusion Detection Problem • Two classical types of intrusion detection systems (IDS): • Host-based: domain is a single machine possibly on a network • Network-based: domain is a network of hosts • Two classes of problem: • Anomaly detection: deviations from normal local resource use and network traffic • Misuse detection: usage identified with known system vulnerabilities and security policies
Essential Requirements of a Network-based IDS • Robustness to host failures and noisy signals (anomalous behavior) • Easy (self-)configurability of hosts • Easy extendibility to new hosts • Scalability: extendible to large networks without degradation of performance • Adaptability: dynamically able to recognize new anomalies • Efficiency: simple and low overhead operation • Global analysis: able to correlate local events to form global patterns
Network Representation • Commonly represent problem as the connection events (not message content) between computers • Kim and Bentley4: • Phenotype: 35 real-valued fields in four categories - connection identifier, port vulnerabilities, TCP handshaking, traffic intensity • Genotype: 35 genes. A detector gene has three “nucleotides” (cluster number in (0, 9), min offset, max offset). An antibody or antigen has a single real value. • Cluster and offset tables established for each host at start • A matching function maps an Ag or Ab value to a cluster and takes the distance to the nearest cluster bound as the measure of similarity • Use positive detection events to evolve the offsets for clusters
Kim and Bentley Model4 New lower interval bound for cluster 2 New upper interval bound for cluster 2
Network Representation (cont.) • Hofmeyr and Forrest3: • Phenotype: 3 integer fields (source IP address, destination IP address, service or port number) • Genotype: for a detector, 49 bit binary string + state
Algorithmic Refinements3 • Detectors may have a lifetime at the end of which they are replaced if they have not matched – maintain diversity • Activation threshold and time decay on activation level to deter limited autoimmune reaction to rare self strings • Local activation causes a message to be sent to other hosts decreasing their activation levels (cytokine costimulation) • Matching rule may result in holes in coverage. A randomly assigned permutation mask to control packet presentation helps avoid this (MHC molecule host diversity) contributing to population diversity. • Each host has a unique detector set contributing to diversity and self-protection across a population of hosts
The Hofmeyr and Forrest Model3 Antigen pheotype Host-based refinement fields Detector state
Problem Posed by Computer Applications • The repertoire of human self proteins is fixed over a lifetime • In networks, valid hosts are added and deleted without notice so what “self” is constantly changes • Among a fixed set of hosts, valid usage patterns may change without notice • One solution: costimulation by a trusted (human) authority both at start and subsequent operation • Much work needs to be done
Advantages of IS Models • Adaptability through the ability to recognize foreign patterns never before encountered • Distributed detection contributes to: • Diversity (shape space coverage) • Robustness (failure of individual hosts) • Scalability and extendibility • Quick response to new variants of old attacks • Ability to reproduce detectors of increasing fitness while self-regulating the overall population
IS as a Swarm System • The IS model has a number of characteristics in common with swarm systems: • Large populations of independent agents of characterizable classes • Each agent has at most a very few characteristic simple behaviors: • Bind with another appropriate agent and activate (B and T cells) • Kill something (killer T cell) • Clone myself (B cell) • Secrete a signaling chemical or an antibody (T and B cells) • Live for a long time (memory B and T cells) • Simple interactions with the environment: • Special things that happen in lymphoid organs • Secreting signal chemicals which alter environmental properties (cytokines and inflammation) • Self-organizing as an emergent property • No centralized control over the system
Areas for Additional Research • Matching rules with good computational properties, perhaps application specific ones • Self/non-self representation and encoding • Algorithms for generating detector sets • Other selection algorithms • Incorporation of additional IS characteristics • Detector set populations: evolution, dynamics and emergent properties at the species level
References • Forrest, Smith, Javornik and Perelson. Using Genetic Algorithms to Explore Pattern Recognition in the Immune System. Evolutionary Computation, 1(3):191-211, 1993. • Forrest, Allen, Perelson and Cherukuri. Self-Nonself Discrimination in a Computer. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, Los Alamitos CA, 1994. • Hofmeyr and Forrest. Immunity by Design: An Artificial Immune System. In Proceedings of 1999 GECCO Conference, 1999. • Kim and Bentley. Negative selection and niching by an artificial immune system for network intrusion detection. In Late Breaking Papers at the 1999 Genetic and Evolutionary Computation Conference, Orlando, Florida, 1999.
References (cont.) • Forrest, Allen, Perelson and Cherukuri. A Change-Detection Algorithm Inspired by the Immune System. Submitted to IEEE Transactions on Software Engineering, 1995. • D'haeseleer, Forrest and Helman. An Immunological Approach to Change Detection: Algorithms, Analysis, and Implications. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, 1996.