680 likes | 756 Views
A Configurable High-Throughput Linear Sorter System. Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS jorgeo@ku.edu.
E N D
A Configurable High-Throughput Linear Sorter System • Jorge OrtizInformation and Telecommunication Technology Center2335 Irving Hill RoadLawrence, KSjorgeo@ku.edu • David AndrewsComputer Science and Computer EngineeringThe University of Arkansas504 J.B. Hunt Building, Fayetteville, ARdandrews@uark.edu
Introduction • Sorting an important system function Popular sorting algorithms not efficient or fast in hardware implementations • Linear sorters ideal for hardware, but sort at a rate of 1 value per cycle • Sorting networks better at throughput, but with high area and latency cost • Need a better solution for high throughput, low latency sorting
Contributions • Expanding the linear sorter implementation and making it versatile, reconfigurable and better suited for streaming input and output • Parallelizing the linear sorter for increased throughput • Implementing the high-throughput linear sorter, and outmatching the performance of current linear sorter approaches
Background • Software quicksort, mergesort and heapsort use divide-and-conquer techniques to achieve efficiency • Hardware sorting plagued with overhead from data movements, synchronization, bookkeeping and memory accesses • Need better use of concurrent data comparisons and swaps, rather than the extended execution of multiple assembly instructions like its software counterpart
Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 3 2 2 5 5 4 4 1 1
Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 3 5 5 4 4 1 1
Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 3 5 4 4 5 1 1
Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 3 5 4 4 1 1 5
Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 3 5 1 4 4 1 5
Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 1 5 3 4 4 1 5
Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 1 2 2 5 3 4 4 1 5
Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: Output:
Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 3 Output:
Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 2 3 Output:
Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 5 2 3 Output:
Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 4 2 3 5 Output:
Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 1 2 3 4 5 Output:
Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 1 2 3 4 5 Output: 1 2 3 4 5
Configurable Linear Sorter • Increase versatility for linear sorters • Configurable: • Linear sorter depth • Sorting direction • Sort on tags (for example, timestamps) rather than data • User-defined data and tag size
Configurable Linear Sorter Increase functionality for linear sorters • Detect full conditions • Buffer input while full • Retrieve output serially for streaming • Delete top value, freeing nodes • Augment with left shift functionality • Test tags before deleting them
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0 5
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0 1 5 7
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 Node 1 7 Node 2 Node 3 Node 4 Node 5 Node 6 0 1 2 5 7 6
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 Node 1 5 Node 2 6 7 7 Node 3 Node 4 Node 5 Node 6 3 0 1 2 5 7 6 2
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 2 Node 1 5 5 5 5 6 7 Node 2 Node 3 6 7 Node 4 7 Node 5 Node 6 0 1 2 3 4 5 7 6 2 1
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 1 Node 1 2 5 5 2 Node 2 6 7 5 Node 3 6 7 Node 4 6 7 7 Node 5 Node 6 3 4 2 1 0 5 2 1 7 6 5 9
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 Node 1 2 5 1 2 6 5 2 Node 2 5 7 6 6 5 7 Node 3 Node 4 7 6 7 7 9 Node 5 Node 6 4 0 1 2 3 5 6 9 1 2 3 7 5 6 1
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 Node 1 5 3 2 1 5 2 5 5 5 Node 2 6 2 7 6 7 6 Node 3 5 6 7 7 Node 4 7 6 9 Node 5 9 7 Node 6 1 0 2 3 6 4 7 5 6 2 7 1 5 9 3 8 1 2
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 2 5 5 Node 1 1 2 3 5 2 6 5 5 7 6 Node 2 5 7 6 Node 3 7 5 6 6 8 7 6 7 Node 4 7 Node 5 9 9 9 7 Node 6 0 1 4 3 2 6 7 8 5 2 1 7 9 3 5 8 6 4 1 3 2
Interleaved Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 5 2 5 Node 1 1 2 4 3 2 Node 2 5 5 6 7 6 5 5 Node 3 7 6 6 7 6 6 5 8 7 7 7 Node 4 7 6 8 9 Node 5 9 7 9 Node 6 9 3 9 1 2 0 6 7 4 8 5 4 2 1 7 9 3 5 8 6 2 1 3
Interleaved Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 2 5 5 5 1 2 3 5 4 Node 1 6 5 2 Node 2 5 6 5 6 5 7 7 7 Node 3 6 6 7 5 6 6 7 7 7 8 7 6 Node 4 8 9 9 7 Node 5 9 9 8 9 Node 6 1 3 2 10 0 4 8 5 9 7 6 3 6 1 9 2 8 5 4 7 3 1 2 4
Interleaved Linear Sorter System Sorted Output Inserted Tags Clock Cycles 2 5 5 5 Node 1 1 5 3 5 4 6 2 5 6 Node 2 5 2 6 7 6 5 7 5 6 6 6 7 8 6 Node 3 5 7 7 6 7 7 8 7 7 8 9 Node 4 7 9 9 Node 5 8 9 9 Node 6 9 10 7 2 6 0 1 11 3 4 8 5 9 3 4 8 6 9 1 2 7 5 3 4 1 2 5
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 2 1 5 5 5 2 6 5 4 5 7 Node 1 3 7 5 8 Node 2 5 5 7 5 6 6 2 6 9 6 6 7 5 7 6 Node 3 7 8 6 8 7 8 6 7 7 Node 4 7 9 7 9 8 9 Node 5 9 9 9 Node 6 3 0 12 1 11 7 10 4 5 8 6 2 9 3 7 6 2 1 5 8 4 9 5 2 1 3 4 6
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles Node 1 5 5 2 5 5 2 8 3 7 5 6 4 1 5 2 6 Node 2 5 5 8 5 7 7 6 6 9 6 5 7 6 6 7 8 7 Node 3 9 6 8 9 7 8 7 6 7 7 Node 4 9 9 9 9 8 Node 5 7 Node 6 9 0 4 2 13 3 5 1 7 8 9 10 11 12 6 5 7 4 8 6 9 2 1 3 4 6 5 3 2 1 7
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 1 2 5 7 3 2 4 5 6 8 9 Node 1 5 7 5 Node 2 5 2 6 5 6 9 5 6 8 7 8 7 6 6 6 9 5 6 7 7 Node 3 7 7 8 7 7 6 Node 4 9 8 9 Node 5 9 9 7 9 8 9 Node 6 4 2 5 1 3 0 10 12 11 14 9 8 7 6 13 1 5 7 4 9 6 8 3 2 6 7 1 3 4 2 5 8
Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 1 5 5 5 Node 1 2 5 9 7 2 6 5 3 4 8 7 2 5 5 6 5 5 9 Node 2 6 7 6 8 8 6 7 7 6 6 7 5 9 Node 3 6 8 Node 4 9 7 7 8 7 7 6 9 8 9 9 Node 5 7 9 Node 6 9 6 12 2 3 1 0 5 4 7 15 8 14 9 10 11 13 3 9 5 2 4 1 8 6 7 6 8 7 2 5 4 3 1 9
Interleaved Linear Sorter • Increase throughput by using multiple linear sorters with parallel inputs • Interleave parallel inputs into linear sorters through modulo arithmetic • Distribute data evenly among linear sorters to avoid full conditions • Service each linear sorter in round-robin fashion to resort their outputs
Interleaved Linear Sorter • ILS width = 4 • Four parallel inputs • {13, 04, 03, 10} • After interleaving mod 4 • {04, 13, 10, 03} • [00, 01, 02, 03] • But, what happens for inputs • {07, 05, 01, 06} ?
Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 0 13 4 3 10
Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 4 13 10 3 1 0 13 7 5 4 3 1 10 6 Inserted
Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 4 5 13 6 10 7 3 1 0 2 13 7 4 5 1 3 1 6 10 Preempted Inserted
Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 4 1 13 5 6 10 3 7 1 2 3 0 2 7 13 5 4 9 1 3 1 12 10 6 15 Preempted Inserted
Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 4 12 5 9 13 1 6 10 2 7 3 15 2 3 4 1 0 11 13 7 2 0 9 5 4 3 1 14 1 12 8 6 10 15 Preempted Inserted
Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 12 0 4 9 13 1 5 2 14 10 6 15 11 7 3 0 1 2 3 5 4 7 13 11 2 9 5 0 4 14 1 12 1 3 6 8 8 10 15 Preempted Inserted
Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 8 4 12 0 9 13 5 1 14 2 6 10 3 15 7 11 2 6 5 0 3 1 4 11 2 7 13 5 4 9 0 1 12 1 14 3 6 15 8 8 10 Preempted Inserted