410 likes | 539 Views
Recursive Design of Hardware Priority Queues. Liron Schiff * ( TAU ) Joint work with Yehuda Afek , Anat Bremler -Barr (TAU) (IDC). ∗Supported by European Research Council (ERC) Starting Grant no. 259085. Priority Queue (PQ). Interface: PQ.Insert ( )
E N D
Recursive Design of Hardware Priority Queues Liron Schiff *(TAU) Joint work with Yehuda Afek, AnatBremler-Barr (TAU) (IDC) ∗Supported by European Research Council (ERC) Starting Grant no. 259085
Priority Queue (PQ) • Interface: • PQ.Insert() • The higher the priority of , the smaller is • PQ.GetMin(): remove and return • PQ.Delete(): just remove • PQ.Peek(): just return minimum Priority Queue Insert GetMin
Priority Queue Applications • Networking: Scheduling Packets • Many flows (1M) • High rate (100Mpps) More Application: Scientific Simulators, Databases 14 33 9 24 13 2 5 7 Priority Queue (scheduler) 19 55 42 27 38 16
Our Approach: The Powering Technique Size PQ size RAM Merge-Sort concept: + = 3 x size HW PQ Base Priority Queue (BPQ) Sort Merge
The Powering Technique • Insert(x) uses Input Input BPQ 3 Exit BPQ
The Powering Technique • Insert(x) uses Input Input BPQ 0 3 Exit BPQ
The Powering Technique • Insert(x) uses Input Input BPQ 5 3 0 Exit BPQ
The Powering Technique • When Input gets full move to Exit. 5 Input BPQ 3 0 Exit BPQ
The Powering Technique • When Input gets full move to Exit. 5 8 Input BPQ 3 7 0 4 Exit BPQ
The Powering Technique • When Input gets full move to Exit. 5 8 6 Input BPQ 3 7 2 0 4 1 Exit BPQ
The Powering Technique • Get_min() extracts the min of Exit or Input 8 6 5 Input BPQ 7 2 3 9 4 1 0 Exit BPQ min
The Powering Technique • Get_min() extracts the min of Exit or Input 8 6 5 Input BPQ 7 2 3 9 4 1 Exit BPQ min 0 • and we update the Exit (if needed).
Outline • Difficulties with the Simple idea • Applying the construction recursively • Exemplifying on TCAM base units • Evaluation
Two difficulties with the simple idea • More than lists in exit module(As lists are emptied, and capacity N is maintained) • Move a list in O(1) op’s from Input to Exit Input Exit
Difficulty 1 • Maintaining capacity N, while lists are shrinking 8 6 5 7 2 3 Input BPQ 4 1 9 Exit BPQ
Difficulty 1 • Maintaining capacity N, while lists are shrinking 8 6 5 7 2 3 Input BPQ 4 1 9 Exit BPQ • We continually merge inactive lists during Insert
Difficulty 1 • Maintaining capacity N, while lists are shrinking 8 6 7 2 Input BPQ 4 1 5 10 9 3 Exit BPQ • We continually merge inactive lists during Insert
Difficulty 1 8 • Maintaining capacity N, while lists are shrinking 6 7 2 Input BPQ 1 5 4 11 10 9 3 Exit BPQ • We continually merge inactive lists during Insert
Difficulty 1 8 • Maintaining capacity N, while lists are shrinking 6 7 11 10 2 5 Input BPQ 9 1 4 3 Exit BPQ • We continually merge inactive lists during Insert
Difficulty 2 • Moving all items from input to RAM in O(1) time Input BPQ Exit BPQ
Difficulty 2 • Moving all items from input to RAM in O(1) time • Use two Input BPQs and switch between them Buffers Input BPQ Input BPQs Exit BPQ
Difficulty 2 • Moving all items from input to RAM in O(1) time • Use two Input BPQs and switch between them Buffers Input BPQ Input BPQ Exit BPQ
Difficulty 2 • Moving all items from input to RAM in O(1) time • Use two Input BPQs and switch between them Buffers Input BPQ Input BPQ Exit BPQ
Difficulty 2 • Moving all items from input to RAM in O(1) time • Use two Input BPQs and switch between them Buffers Input BPQ Input BPQ Exit BPQ
Block Size – Time Tradeoff • Apply the construction recursively • We used Exit and Input Input BPQ Input BPQ Exit BPQ
Block Size – Time Tradeoff • Apply the construction recursively • We used Exit and Input • We can use Exit and Input Input BPQ Input BPQ Exit BPQ
Block Size – Time Tradeoff • Apply the construction recursively • We used Exit and Input • We can use Exit and Input • We can build each Input recursively Input BPQ Input BPQ Input BPQ Input BPQ Exit BPQ Exit BPQ
Block Size – Time Tradeoff Input BPQ Input BPQ Input BPQ Exit BPQ Input BPQ Input BPQ Input BPQ Exit BPQ Exit BPQ
Block Size – Time Tradeoff Input BPQ Insert Input BPQ Input BPQ Exit BPQ Input BPQ Input BPQ Insert Input BPQ Exit BPQ Exit BPQ
Block Size – Time Tradeoff • A Systolic Array like design: RAM Exit BPQ Input BPQ Buf RAM Exit BPQ Exit BPQ … in RAM Buf Input BPQ Exit BPQ Exit BPQ Exit BPQ Exit BPQ
Ternary CAMs (TCAMs) • Associative Memory chips: • Properties: • Ternary values (‘0’,’1’ and ‘*’) • Already used in routers (IP lookup, classification) • High throughput (300M ops per sec for 1Mb TCAM) • Latency and costs increase dramatically with size 0 0*10**1* entry index entry data 1 00100111 2 out in 11***011 0 00100111 m 01010110
TCAM based Priority Queue • Implied by Panigrahy& Sharma (2003) • Three versions: • O(1) time but O(w) entries per item (where w is the width of a priority value in bits) • O(logw) time • “Empirical O(1)” time but O(w) on w.c. BPQ
TCAM based Priority Queue • Implied by Panigrahy& Sharma (2003) • Our results: Powering Powering
Powering the TCAM BPQ • Using small TCAM-based PQs • Faster TCAM access • Feasible even when N is large • Suits well backbone routers • TCAMs are already used for IP-lookup
Results for TCAM-based PQ Size limit k=2 A k=1 B C
Applying to Shift-Registers • Considering a HW PQ implementation of R. Chandra and O. Sinnen. Size limit Original K=1 K=2
Summary • The Powering Technique • Combine Small HW queues and RAM • Allows space – time tradeoffs • Powering TCAMs • Smaller TCAMs shorter operation time • Matches lower bound for sorting with TCAM • Also works for Shift Registers