200 likes | 290 Views
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms. Lab Exercises. Lab # 1: Parallel Programming and Performance measurement using MPAC. Lab Goals. Objective Performance measurement using MPAC benchmarks
E N D
Programming Multi-Core Processors based Embedded SystemsA Hands-On Experience on Cavium Octeon based Platforms Lab Exercises
Lab # 1: Parallel Programming and Performance measurement using MPAC 5-2
Lab Goals • Objective • Performance measurement using MPAC benchmarks • Learning parallel programming using MPAC 5-3
Lab Goals • Parallel Sorting • This lab implements two parallel sorting algorithm • Quick sort • Bucket Sort • Objective • Partitioning of data array • Worker threads sorting partitioned array • Merge partitioned arrays • Performance measurements 5-6
36 26 21 43 43 7 21 43 7 31 26 32 31 36 43 32 7 14 26 43 36 26 21 1 14 4 21 23 1 12 21 14 21 7 4 26 21 21 7 4 12 23 7 8 4 12 12 36 7 8 4 21 23 8 23 4 12 36 32 23 4 1 32 32 8 7 1 1 8 7 4 14 31 21 4 31 31 4 7 14 Thread Function Parallel Quick Sort (1) (2) (3) (4) 5-7
1 - 11 12 - 22 23 - 33 34 - 44 7 1 21 14 Thread Function 8 7 43 7 32 21 21 14 32 26 21 43 43 26 7 7 43 7 23 4 4 36 21 12 36 14 4 1 26 21 12 4 31 36 23 36 26 8 7 21 32 12 14 23 4 4 4 31 8 4 7 1 32 8 21 31 23 12 31 1 Parallel Bucket Sort (1) (2) (3) (4) 5-8
Performance Measurement • Observations • Observe the decreasing elapsed time indicating increase in performance with increasing number of threads • Bucker Sort more efficient than Quick Sort 5-9
Lab # 3-5: Packet Sniffing Labs An overview 5-10
Lab Goals • Objective • Learning parallel programming using threads • Utilizing many core systems efficiently • Performance measurement • Packet capture / filter / analyze - A case study • We will use a series of labs to achieve our objectives. 5-11
Prerequisites • Sniffing • Capturing of network packets arriving or departing from a network interface • Mechanism • We use raw sockets as follows rawSock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL)) • This system call picks every packet going out or coming in on an Ethernet interface 5-12
Prerequisites • Testing • You can use loop back device as a network interface • Use Netperf or Mpac for traffic generation on the network interface 5-13
Packet Capturing on Many Core 1 3 5 7 9 11 13 15 0 2 4 6 8 10 12 14 CPU Affinity Packet Sniffer Data Sender Receiver Dedicated Cores 5-14 Core
Sniffing Labs Framework • Sniffing • One thread, called the dispatcher, sniffs the packets from the interface and puts it in one of the workers’ queues • Filtering / Analysis • Any kind of processing on a packet is the responsibility of the workers • Each worker has its own queue • Dispatcher assigns packets to worker queues 5-15
Lab 3 – Packet Sniffing • Sniff a frame • This lab captures Ethernet packets which are destined to or departing from a specific interface • Objective • Can a dispatcher sniff at the line rate • Hands on experience of plain sniffing • Observing the base case performance of the dispatcher – worker model 5-16
Lab 4 – Packet Filtering • Objective • Use different packet header information to sniff specific type of packets • Mechanism • Dispatcher will sniff frames and will put in worker queues in round-robin fashion • User will specify source IP, destination IP, source port and destination port for filtering in TCP packets 5-17
Lab 4 – Packet Filtering • Mechanism • Each worker will process packets residing in its queues • Observations • Observe the throughput performance with increasing number of threads • Compare the throughput with lab 3 throughput • Use core affinity and observe throughput 5-18
Lab 5 – Deep Packet Inspection • Objective • A user provided string will be searched in the TCP based application payload • Mechanisms • Same as Lab 4 except each worker now finds a string in the application payload • String to find is provided by the user 5-19
Lab 5 – Deep Packet Inspection • Observations • Observe the throughput performance with increasing number of threads • Compare the throughput with lab 3 and 4 throughput • Use core affinity and observer performance 5-20