1 / 41

Recursive Design of Hardware Priority Queues

Recursive Design of Hardware Priority Queues. Liron Schiff * ( TAU ) Joint work with Yehuda Afek , Anat Bremler -Barr (TAU) (IDC). ∗Supported by European Research Council (ERC) Starting Grant no. 259085. Priority Queue (PQ). Interface: PQ.Insert ( )

chana
Download Presentation

Recursive Design of Hardware Priority Queues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recursive Design of Hardware Priority Queues Liron Schiff *(TAU) Joint work with Yehuda Afek, AnatBremler-Barr (TAU) (IDC) ∗Supported by European Research Council (ERC) Starting Grant no. 259085

  2. Priority Queue (PQ) • Interface: • PQ.Insert() • The higher the priority of , the smaller is • PQ.GetMin(): remove and return • PQ.Delete(): just remove • PQ.Peek(): just return minimum Priority Queue Insert GetMin

  3. Priority Queue Applications • Networking: Scheduling Packets • Many flows (1M) • High rate (100Mpps) More Application: Scientific Simulators, Databases 14 33 9 24 13 2 5 7 Priority Queue (scheduler) 19 55 42 27 38 16

  4. Two Existing Approaches

  5. Our Approach: The Powering Technique Size PQ size RAM Merge-Sort concept: + = 3 x size HW PQ Base Priority Queue (BPQ) Sort Merge

  6. The Powering Technique • Insert(x) uses Input Input BPQ 3 Exit BPQ

  7. The Powering Technique • Insert(x) uses Input Input BPQ 0 3 Exit BPQ

  8. The Powering Technique • Insert(x) uses Input Input BPQ 5 3 0 Exit BPQ

  9. The Powering Technique • When Input gets full move to Exit. 5 Input BPQ 3 0 Exit BPQ

  10. The Powering Technique • When Input gets full move to Exit. 5 8 Input BPQ 3 7 0 4 Exit BPQ

  11. The Powering Technique • When Input gets full move to Exit. 5 8 6 Input BPQ 3 7 2 0 4 1 Exit BPQ

  12. The Powering Technique • Get_min() extracts the min of Exit or Input 8 6 5 Input BPQ 7 2 3 9 4 1 0 Exit BPQ min

  13. The Powering Technique • Get_min() extracts the min of Exit or Input 8 6 5 Input BPQ 7 2 3 9 4 1 Exit BPQ min 0 • and we update the Exit (if needed).

  14. Outline • Difficulties with the Simple idea • Applying the construction recursively • Exemplifying on TCAM base units • Evaluation

  15. Two difficulties with the simple idea • More than lists in exit module(As lists are emptied, and capacity N is maintained) • Move a list in O(1) op’s from Input to Exit Input Exit

  16. Difficulty 1 • Maintaining capacity N, while lists are shrinking 8 6 5 7 2 3 Input BPQ 4 1 9 Exit BPQ

  17. Difficulty 1 • Maintaining capacity N, while lists are shrinking 8 6 5 7 2 3 Input BPQ 4 1 9 Exit BPQ • We continually merge inactive lists during Insert

  18. Difficulty 1 • Maintaining capacity N, while lists are shrinking 8 6 7 2 Input BPQ 4 1 5 10 9 3 Exit BPQ • We continually merge inactive lists during Insert

  19. Difficulty 1 8 • Maintaining capacity N, while lists are shrinking 6 7 2 Input BPQ 1 5 4 11 10 9 3 Exit BPQ • We continually merge inactive lists during Insert

  20. Difficulty 1 8 • Maintaining capacity N, while lists are shrinking 6 7 11 10 2 5 Input BPQ 9 1 4 3 Exit BPQ • We continually merge inactive lists during Insert

  21. Difficulty 2 • Moving all items from input to RAM in O(1) time Input BPQ Exit BPQ

  22. Difficulty 2 • Moving all items from input to RAM in O(1) time • Use two Input BPQs and switch between them Buffers Input BPQ Input BPQs Exit BPQ

  23. Difficulty 2 • Moving all items from input to RAM in O(1) time • Use two Input BPQs and switch between them Buffers Input BPQ Input BPQ Exit BPQ

  24. Difficulty 2 • Moving all items from input to RAM in O(1) time • Use two Input BPQs and switch between them Buffers Input BPQ Input BPQ Exit BPQ

  25. Difficulty 2 • Moving all items from input to RAM in O(1) time • Use two Input BPQs and switch between them Buffers Input BPQ Input BPQ Exit BPQ

  26. Block Size – Time Tradeoff • Apply the construction recursively • We used Exit and Input Input BPQ Input BPQ Exit BPQ

  27. Block Size – Time Tradeoff • Apply the construction recursively • We used Exit and Input • We can use Exit and Input Input BPQ Input BPQ Exit BPQ

  28. Block Size – Time Tradeoff • Apply the construction recursively • We used Exit and Input • We can use Exit and Input • We can build each Input recursively Input BPQ Input BPQ Input BPQ Input BPQ Exit BPQ Exit BPQ

  29. Block Size – Time Tradeoff Input BPQ Input BPQ Input BPQ Exit BPQ Input BPQ Input BPQ Input BPQ Exit BPQ Exit BPQ

  30. Block Size – Time Tradeoff Input BPQ Insert Input BPQ Input BPQ Exit BPQ Input BPQ Input BPQ Insert Input BPQ Exit BPQ Exit BPQ

  31. Block Size – Time Tradeoff • A Systolic Array like design: RAM Exit BPQ Input BPQ Buf RAM Exit BPQ Exit BPQ … in RAM Buf Input BPQ Exit BPQ Exit BPQ Exit BPQ Exit BPQ

  32. Resulting Tradeoffs

  33. TCAM example

  34. Ternary CAMs (TCAMs) • Associative Memory chips: • Properties: • Ternary values (‘0’,’1’ and ‘*’) • Already used in routers (IP lookup, classification) • High throughput (300M ops per sec for 1Mb TCAM) • Latency and costs increase dramatically with size 0 0*10**1* entry index entry data 1 00100111 2 out in 11***011 0 00100111 m 01010110

  35. TCAM based Priority Queue • Implied by Panigrahy& Sharma (2003) • Three versions: • O(1) time but O(w) entries per item (where w is the width of a priority value in bits) • O(logw) time • “Empirical O(1)” time but O(w) on w.c. BPQ

  36. TCAM based Priority Queue • Implied by Panigrahy& Sharma (2003) • Our results: Powering Powering

  37. Powering the TCAM BPQ • Using small TCAM-based PQs • Faster TCAM access • Feasible even when N is large • Suits well backbone routers • TCAMs are already used for IP-lookup

  38. Results for TCAM-based PQ Size limit k=2 A k=1 B C

  39. Applying to Shift-Registers • Considering a HW PQ implementation of R. Chandra and O. Sinnen. Size limit Original K=1 K=2

  40. Summary • The Powering Technique • Combine Small HW queues and RAM • Allows space – time tradeoffs • Powering TCAMs • Smaller TCAMs  shorter operation time • Matches lower bound for sorting with TCAM • Also works for Shift Registers

More Related