220 likes | 311 Views
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks. Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of Texas at San Antonio October 28, 2005. Outline. Motivation Hashing Background Linear Distribution Optimal Hashing Simulation Conclusion.
E N D
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of Texas at San Antonio October 28, 2005
Outline • Motivation • Hashing Background • Linear Distribution • Optimal Hashing • Simulation • Conclusion
Motivation • All network applications require some searching • Switches, routers and intrusion detection systems require the searching of IP address or subnet IDs • Searching should be based on distribution of the records in the database • For computer networks, searching needs to be real-time
Motivation (cont.) • A capture of network traffic shows the non-uniform distribution of IP type C addresses • Since IP address entering the network are non-uniform then searching should take this into account
Hashing Background • Straightforward sequential searching impractical for large databases • Hashing reduces the database into small subsets • Searching subsets reduces search time • Predictable time needed for real-time applications
Hashing Background • Hashing algorithms are well research, we look to provide new insight base on the probability distribution • This work is not concern about collision, each hashing key will have the same number of collision in a link list • Hashing using probability background should limit the average number of searches in the link list
Linear Distribution • From our capture network traffic we can approximate the non-uniform distribution by a linear probability distribution function
XOR Hashing For Linear Distribution • We wanted a straightforward hashing scheme that can be used for any size database and hashing space • Define the hashing function as P=(gm-1,gm-2,…,g0) • Measure hashing functions against each other by the value δ • δ measure how close to uniform the hashing creates
XOR Hashing for Linear Distribution4-bit to 2-bit Example P=(2,2)
XOR Hashing for Linear Distribution4-bit to 2-bit Example P=(3,1)
XOR Hashing for Linear Distribution4-bit to 2-bit Example P=(1,3)
XOR Hashing Observation • Observations: • gi > 1: leads to equal partitioning • gi = 1: leads to unequal partitioning • δ: difference between highest hash distribution density and mean • To find δ: we need to determine highest final hash distribution density
Optimal XOR Hashing for Linear Distribution • Hashing consists of m steps (from step m-1 to step 0) • pi : highest density value after step i • Derive pi from pi+1 at step i • pm = A = 1/2n (original mean before hashing) • δ = p0 – 1/2m
δ vs. P for Linear Distribution • Optimal solution comes from all groups XORing more than 1 bit
Simulation • Goal: Demonstrate that lower δ leads to better search performance • Hashing: map from 2n to 2m • Each simulation performs 2m hash lookups
Simulation • Three performance measurements • Number of Empty Bins (NEB) • Average maximum Search Length (ASL) • Maximum Search Length (MSL)
Simulation • Improvement from best δ over worst δ • NEB: 18% • ASL: 12% • MSL: 17%
Future Work • Find optimal XOR hashing for exponential distribution and partial linear distribution • Look more in depth to see if what applications exhibit linear distribution • Find performance gain of using this hashing scheme in an intrusion detection system
Conclusion • Network applications demonstrate non-uniform distribution making known search techniques less than optimal • Linear distribution can benefit from the XOR folding property • Optimal XOR grouping can be easily identified to minimize error in hashing distribution • Theory in linear case can be applied to other non-uniform distributions