520 likes | 666 Views
Advanced topics in Computer Networks. Lecture 9: Tree-based lookup. University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani. Outline. Issues Multiway and Multicolumn search DMP-Tree Some implementation issues. Issues. How to sort prefixes Prefixes as ranges
E N D
Advanced topicsinComputer Networks Lecture 9: Tree-based lookup University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani Adv. topics in Computer Network
Outline • Issues • Multiway and Multicolumn search • DMP-Tree • Some implementation issues Adv. topics in Computer Network
Issues • How to sort prefixes • Prefixes as ranges • Comparing prefixes • Based on length • Add extra bits at the end. • New definition (DMP-tree) • How to apply tree structures like binary tree or m_way tree to prefixes Adv. topics in Computer Network
Multiway tree lookup. • Proposed by G. Varghese and his students. • Consider prefixes as range. • First try: Pad 0’s to prefixes in order to apply binary search tree. consider {1*, 101* and 10101* }prefixes 100000 101000 101010 Should match here Binary search fail for all of them!. 101011 101110 111110 Binary search ends here. Adv. topics in Computer Network
L L L H 101011 101110 111110 H H Multiway tree lookup(cont) • Two problem in the previous example • Being Far away from matching prefix • Multiple addresses matching different prefixes end up in the same region. • Solution: Prefixes as ranges, Put the end of range in the table. 100000 101000 101010 101011 101111 111111 We have the explicit ranges. Search maps to one range only. Adv. topics in Computer Network
L L L H 101011 101110 111110 H H Multiway tree lookup(cont) 100000 101000 101010 101011 101111 111111 For 101011, we try to find first L which is not followed by H. For the rest, we can have a stack operation to find the first L. Problem: Linear search to find L Adv. topics in Computer Network
L L L H 101011 101110 111110 H H Multiway tree lookup(cont) Solution: Precompute prefixes corresponding to ranges. 100000 101000 101010 101011 101111 111111 > = P1)100000 P1 P1 P2)101000 P2 P2 P3)101010 P3 P3 101011 P2 P3 101111 P1 P2 111111 - P1 1* matching prefix. Adv. topics in Computer Network
DMP-Tree • Comparing prefixes. • Sorting prefixes • Binary prefix Tree. • M_way prefix tree. Adv. topics in Computer Network
Trie structure • Trie or radix tree Adv. topics in Computer Network
Sorting prefixes • Question? Why well-known tree structures cannot be applied to the longest prefix matching problem? • Answer- No a well-known method for sorting. • Definition: Assume Aa1a2…an and B=b1b2…bm to be prefixes of and there a character • 1.If n=m, the numerical values of A and B are compared. • 2.If n m (assume n<m), the two substrings a1a2…an and b1b2…bn are compared. If a1a2…an and b1b2…bn are equal, then, the (n+1)th character of string B is checked. It is considered B>A if bn+1 is before and B A otherwise. Adv. topics in Computer Network
Sorting prefixes (cont) • Example- Assume M is Then, BOAT is smaller than GOAT and SAD is bigger than BALLOON. CAT is considered bigger than CATEGORY since the fourth character in CATEGORY, E, is smaller than M. • Sorting is a function to determine the position of each prefix. • Prefixes of table is sorted as: 00010*,0001*,001100*,01001100*,0100110*,01011,001*,01011*,01*,10*,10110001*,1011001*,10110011*,1011010*,1011*,110* Adv. topics in Computer Network
Binary prefix tree • Unfortunately, it fails for 101100001000 Why? • Prefixes are ranges and not just a data point in the search space. Adv. topics in Computer Network
Binary prefix tree (cont) • Definition: prefixes A and B are disjoint if none of them is a prefix of other. • Definition : prefix A is called enclosure if there exists at least one element set such that A is a prefix of that element. • We modify the sort structure; • Each enclosure has a bag to put its data element on it. • Sort remaining elements. • Distribute the bag elements to the right and left according the sort definition. • Apply algorithm recursively. Adv. topics in Computer Network
Binary prefix tree (cont) • Example- Prefixes in table 1. First step. The second step, Note-enclosures are in the higher level than the contained elements. (important!) Adv. topics in Computer Network
Binary prefix tree (cont) • The final tree structure Adv. topics in Computer Network
Sorting prefixes (cont) • Sorting algorithms • Based on bubble sort • Based on Radix sort. Tmp= MinLength(list) for all i in list except tmp do compare i with tmp if i matches tmp then; put i in tmp’s bag if i<tmp then put i in leftList: if i>tmp then put i in rightist: endfor list = Sort(leftList) Sort(rightList) Adv. topics in Computer Network
M_way prefix tree • Problems with the binary prefix tree. • Two way branching. • The structure is not dynamic and insertion may cause problems!. • Divide by m after sorting the strings • Static m_way tree. • Build a dynamic data structure like B-tree. • How to guarantee enclosure to be in the higher level than its contained elements. • Define node splitting and insertion. Adv. topics in Computer Network
M_way prefix tree (Cont) • Node splitting: Finding the split point. • Take the median if the data elements are disjoint. • If thereis an enclosure containing other elements, take it as split point. • Otherwise, take an element which gives the best splitting result. • Note, this does not guarantee the final tree will be balanced. Adv. topics in Computer Network
M_way prefix tree (Cont) • Insertion: • If the new element is not an enclosure of others, find its place and insert in the corresponding leaf, like B-tree. • Otherwise, replace the closet element with element and reinsert the replace elements. • Resort the resulted subtree, (space division) if necessary. • Building tree is similar to building B-tree. Adv. topics in Computer Network
Prefix Abbrv. Prefix Abbrv. 10 - 1101110010 K 01 - 10001101 L 110 - 11101101 M 1011 - 01010110 N 0001 - 00100101 O 01011 - 100110100 P 00010 - 101011011 Q 001100 A 11101110 R 1011001 B 10110111 S 1011010 C 011010 T 0100110 D 011011 U 01001100 E 011101 V 10110011 F 0110010 W 10110001 G 101101000 X 01011001 H 101101110 Y 001011 I 00011101 Z 00111010 J 011110110 II M-way prefix tree (cont) • Example Adv. topics in Computer Network
M-way prefix tree (cont) • We insert prefixes randomly. • The tree uses 5 branching factor (at most 4 prefixes in each node) • Insert 01011, 1011010, 10110001 and 0100110. Then, adding 110 cause overflow. Split node 10110001 (0100110,01011) (1011010, 110) (all element are disjoint) Adv. topics in Computer Network
M-way prefix tree (cont) • Insert 10110011, 1101110010, 00010. Adding 1011001 causes overflow. 10110001 1011010 (00010,0100110,01011) (1011001,10110011) (110,1101110010) (case 3 of splitting) • Latter adding 1011 cause problem. It is the case of adding an enclosure. We will have space division. Adv. topics in Computer Network
M-way prefix tree (cont) • The final tree • The tree supersede B-tree or B-tree is a special case of this tree. Then, when data element are relatively disjoint, the height of tree is logMN. Adv. topics in Computer Network
DMP-Tree Max. height No. of Data • BF is Branching factor in the internal nodes. • No. of Data is in1000s. Adv. topics in Computer Network
DMP-Tree No. of Data • Number of prefixes in the right. Adv. topics in Computer Network
DMP-Tree • Height of tree for 100K data prefixes. Height Branching Adv. topics in Computer Network
DMP-Tree Analyzing of results. • With increasing BF, Branching Factor, the height decreases. • The result are for the worst case, Max height, and the ave. case is much less. • After BF=9, increasing Branching Factor does not decrease the max. height. • The results are for the set of prefixes of 50,000-100,000 with lengths from 8 t0 31. The size of actual prefixes in use is around 50,000 and the length is 8-31. Adv. topics in Computer Network
DMP-Tree Memory utilization:, • Mem. Utilization is 0.64%-0.67% without considering the tree branching overhead. • Mem. Utilization is 0.53%-0.62% with tree branching overhead (pointers). • Without considering branching pointers, the mem. Utilization decreases with increasing the branching factor. • Total mem. Utilization increases with increasing the branching factor. Adv. topics in Computer Network
DMP-Tree Therefore, • The longest matching prefix of a network can be determined in 5 steps with 9 or more branching factor. • In the worst case, we need at most 2 times of total prefix data size of memory to implement the scheme. For instance, for 50,000 prefixes of 32bit, we need at most 3.2 Mbit of memory. Adv. topics in Computer Network
Overall Design • All operations need search first in the Tree structure. • Two search procedures, one for the longest matching prefix and another for update. • The prefix tree data structure is on the chip. • The Policy table is on the off chip memory. • There is a port to data link layer mapping module. Adv. topics in Computer Network
Tree Nodes Internal nodes Branching factor • Internal nodes. • Each prefix has a left and right pointer which are pointing to left and right subtrees respectively. • We can have N prefixes in each internal node. Then, N+1 is the branching factor. • The bigger N, the faster search time, but the more logic is needed. • Port is the address of the port in the switch to which the packet will be sent. Leaf nodes Adv. topics in Computer Network
Tree Nodes • Leaf nodes. • There is no left and right subtree pointers. • The number of prefixes in the leaf node is M. • The leaf nodes are stored in a off chip memory to make the scheme scalable to the large number of prefixes. Adv. topics in Computer Network
Branching Factor • What is the best number for N? (Branching factor) • The bigger N, the faster search process. (Fact 1) • The bigger N, the more memory pins are and usually the more mem. Bandwidth is needed (Fact 2). • The bigger N, the more logic we need to process the node (Fact 3). • Simulation result shows • The bigger N, the better memory utilization in the memory. • For N 8, the max. height of the tree does not decrease considerably. Adv. topics in Computer Network
Simulation result • Total memory: assuming one memory block and OC-192. Adv. topics in Computer Network
Branching Factor • It seems any number between 8-16 is reasonable. But, N=9 gives a better search time, memory size. • Assuming 9 branching factors in the internal node, %50 node utilization and 128K prefixes, we need max. 128K/4.5= 28.5K address. Then, 15 bitaddress for left and right pointers are more than enough. But, we need more for off chip addressing • The number of switch port are usually limited, around 64, We can assume 256, then 8 bit is enough to address them. Adv. topics in Computer Network
Branching Factor • In order to make the internal node branching and leaf node branching even, M=10. • If we want to read a node at once, we will need 41x10=410 pins which is difficult to support in one chip. • We can divide a node in two and read/write in two clock cycles. This reduce the memory pins to 205 which is affordable. Adv. topics in Computer Network
Memory requirement • Prefix tree: Assuming 128K prefixes. N = 9 (BF) and M=10 (BF in leaves), the majority of prefixes, %80 will be in leaves, assume %65 node utilization, # of ave prefixes in a leaf node node = 10*0.6 5= 6.5 # of leaf nodes 128Kx%80x2/6.5 = 31.5K and %10 overhead 35 K Total off chip memory = 35K x 205(Mem BW) = 7.2 Mbits Then, we need 16 bits for addressing. 1 bitfor internal/external. # of internal nodes= 128Kx%20/5.8=4.41K and %10 overhead 4.9 K Total on chip memory=4.9Kx529K 2.6Mbits • Port to link address mapping table. For each port corresponding link address Max. 256 ports, on chip, some mem for indexing Adv. topics in Computer Network
Memory requirement • In summary: • Note: • Branching factor is the # of branching in internal nodes. • The size of the memory scales with the size of data or # of prefixes. • Power dissip. depends on the r/w freq, current & core voltage • Considering Faraday Mem. Modules A 10Kx32 bits single port mem size is 36x1.45 mm2. Adv. topics in Computer Network
Port2Addr MEM Ctrl Overall Design Memory Mem. Ctrl root addr Update Search Search To/From NP root content Insertion update delete CPU Inter face To/From CPU Output mem Ctrl To/From out Mem Adv. topics in Computer Network
Search Path Mem Ctrl To/From Off mem Root Node RdAdd[19] Node Data[32] Node Input Addr[32] Piping GetLen Compare Addr[32] Next CResult[1] SOA[1] Len[Nx6] InClk[1] Match[1] First[1] SOA[1] Addr[32] Prty[1] Addr[32] MemAddr[14] PackAdd[29] Found[1] OutMemAddr IpAddr[32] Dispatch LinkAddr[48] Cashing Port[8] Addr[32] Addr[32] DataOut[32] To Scheduler There are data assertion signals between blocks which has not been shown every where because of space limitation. Adv. topics in Computer Network
Search Path • Input Module: • Get the packet destination addresses from the parser. • Do parity checking. • It has the following input signals • Input data which 32 bits. • Start of Address, 1 bit, (SOA) • Parity, 1 bit, (prty) • Input clock, (InClk) • It gets Data in two clock cycles, first the IP address and then, the packet address in the memory or packet id (cid) Adv. topics in Computer Network
Search Path • Input Module: 29 bits is used for the packet address and the last 3 bit for the policy, Then, 512 Mbytes can be supported to store the packet before sending them out. • The 2nd clock cycle data format: • The timing 31 2 0 InClk SOA PackAddr Or cid Data IpAddr Adv. topics in Computer Network
Search Path • Piping Module: • Pipelines the search process. For new elements from input block does. For each new IP address do If found in the hash table send the packet memory address to dispatcher Else Enter IP address and the policy into the pipe FIFO End do, Adv. topics in Computer Network
Search Path • Piping Module: For elements in the FIFO For the first IP address in FIFO do If IP address is new then, assert first signal and send IP address and policy out. Else if next addr is on chip send the next node address to Mem. Ctrl. Else send the next node address to OffMemCtrl. send to the pipe the IP address and policy. For the recirculated address If the node was leaf then, Send the longest matching address to OutMemCtrl. Send Policy to Extract port and the packet address to dispatch. Else Put the IP address into the FIFO Replace the longest matching prefix address if a new one found. Adv. topics in Computer Network
Search Path • Piping Module: • FIFO . Keep the current information of IPs. LMPA = Longest Matching Prefix Address New = 1 new , 0 old If the packet is new the next address will be zero and we can read root cash content instead of reading from memory. The address is off chip if the first, most significant bit is 1, otherwise it on chip. Adv. topics in Computer Network
Search Path • GetLen Module: This module get the length of prefixes. We add 1 to the end of a prefix and then padded with ‘0’s to make it 33 bits. Ex. 11011010 110110101000…0 (33 bits). Then, we should start from right and the first ‘1’ we meet, the rest is the prefix length. GetLen can be implemented as a multiplexer with case statement (32 case statement) and it can be done in one clock cycle. Adv. topics in Computer Network
Search Path • Compare Module: compare two prefix A and B with lengths L1 and L2. Assume L1>L2 and A[1:L2] is the first L2 bit from A, Then, • If A[1:L2] = B A and B match. If A[L2+1] = 0 A B. Otherwise, A> B. • If A[1:L2] > B A >B, otherwise A<B. One of the prefixes here is IP address with length 32. We assume there are no two identical elements in the tree. Adv. topics in Computer Network
Search Path • Next Module: Get the next node address to read and also the matching prefix and its corresponding port number. It gets two signals for each prefix, Match and ComResult (compare), Match =1 the prefix match, ComResult = 1 Prefix is bigger. It gets the left address of the first prefix, from the left, such that its ComResult signal is 1. It compares the matching prefix lengths and the get the one with the largest length. Adv. topics in Computer Network
Search Path • Dispatch Module: forms the Routing Group Address, RGA, from the port number and send it with packet stored memory address (PSMA) or CID. RGA is a 64 bit size bit map. The bit correspond to port number is set to 1. PSMA is dispatched first and Port and DLL address follows. • Cashing Module: keep a cash of IP address and corresponding port. Adv. topics in Computer Network
Search Path • Cashing Module: The cash is kept as a FIFO and its depth depends on the technology. Check IP address in FIFO. If the address found, then, assert found signal. write IP address on top of FIFO if it is not there already. Else write IP address on top of FIFO Cashing system always removes the last reference IP address from the cash. Adv. topics in Computer Network