540 likes | 695 Views
Applications of Non-Blocking Data Structures to Real-Time Systems. Seminar for the degree of Licentiate of Philosophy Håkan Sundell Computing Science Chalmers University of Technology. ARTES project: ”Applications of wait/lock-free protocols to real-time systems” Started in March 1999.
E N D
Applications of Non-Blocking Data Structures to Real-Time Systems Seminar for the degree of Licentiate of Philosophy Håkan Sundell Computing Science Chalmers University of Technology
ARTES project: ”Applications of wait/lock-free protocols to real-time systems” Started in March 1999. One active Ph.D.-student. Project leader: Philippas Tsigas Background
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Shared Register Software engineering part Conclusions & Future Work Schedule
Uni- or Multi-processor system Interconnection Network e.g. The Controller Area Network (CAN). Real-Time Systems CPU CPU CPU CPU
Shared Memory Real-Time Systems CPU CPU . . . CPU Cache Cache Cache Memory - Uniform Memory Access (UMA) ... ... ... CPU CPU CPU CPU CPU CPU . . . Cache bus Cache bus Cache bus Memory Memory Memory - Non-Uniform Memory Access (NUMA)
Cooperating Tasks Timing Constraints Inter-task Communication: Shared Data Objects Needs Synchronization Real-Time Systems T2 ? ? ?? ? ? T1 T3
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Shared Register Software engineering part Conclusions & Future Work Schedule
Synchronization using Locks Uses semaphores, spinning, disabling interrupts Negative Blocking Priority inversion Risk of deadlock Positive Execution time guarantees easy to do, but pessimistic Synchronization Take lock ... do operation ... Release lock
Lock-Free Synchronization Retries until not interfered by other operations Usually detecting interference by using some kind of shared variable indicating busy-state or similar. Non-blocking Synchronization Change flag to unique value, or remember current state ... do the operation while preserving the active structure ... Check for same value or state and then validate changes, otherwise retry
Lock-Free Synchronization Negative No execution time guarantees, can continue forever - thus can cause starvation Positive Avoids blocking and priority inversion Avoids deadlock Fast execution on average Non-blocking Synchronization
Non-blocking Synchronization Uses atomic synchronization primitives Uses shared memory Wait-Free Synchronization Always finish in a finite number of its own steps Negative Complex algorithms Memory consuming Non-blocking Synchronization Test&Set Compare &Swap Copying Helping Announcing Split operation ???
Wait-Free Synchronization Positive Execution time guarantees Fast execution Avoids blocking and priority inversion Avoids deadlock Avoids starvation Same implementation on both single- and multiprocessor systems Non-blocking Synchronization
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Shared Register Software engineering part Conclusions & Future Work Schedule
Correctness criteria for concurrent operations: linearizability All concurrent executions can be transformed into an equivalent serial sequence of atomic operations preserving the partial order ti Write tj Read tk Write ser t Shared Data Objects
Snapshot A consistent momentous state of a set of several shared variables that are logically related One reader (scanner) Reads the whole set of variables in one atomic step Many writers (updaters) Writes to only one variable each time Snapshot
Atomicity / Linearizability criteria Snapshot: Correctness Read YES ci Write Write t Read YES ci Write Write t Read NO ci Write Write t = returned by scanner
Atomicity / Linearizability criteria Snapshot: Correctness Read NO ci Write Write t ci Write Write NO cj Write t = returned by scanner
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Register Software engineering part Conclusions & Future Work Schedule
Wait-free snapshot algorithm by Ermedahl et. al 3 register copies for each component Uses the Test&Set atomic primitive for synchronization What are we evaluating Used by reader Used by writer
Real-Time System: Measured schedulability Created “realistic” scenarios on a theoretic 68020 uni-processor system Real RTOS parameters Manual WCET-analysis on cycle level 1 scanner (5 components), 24 updaters (10 real-time tasks, 15 interrupts) Fixed priority response time analysis Schedulable without any synchronization Adding lock/wait-free or semaphore synchronization Analysis
Simulation RT-simulator written in Erlang by Ermedahl and Sjödin. Fixed priority preemptive scheduler Semaphores Messages Subset of scenarios used in analysis Experiments
Multi-node: Simulation of CAN-bus 1 MHz 10 nodes connected using messages Local snapshots on each node 1 super-snapshot task on 1 node Subset of scenarios used for single-node analysis Experiments
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Register Software engineering part Conclusions & Future Work Schedule
Previously used by Chen and Burns in 1999. Assuming system with periodic fixed-priority scheduling Notations from Standard Real-Time Response Time Analysis Use information about Periods , T Worst-case Computation time , C Worst-case Response times , R Timing Information
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Register Software engineering part Conclusions & Future Work Schedule
Back to Basics: Unbounded Memory Protocol The reader increases global index and scans backwards. Snapshot ? = previous values / nil w = writer position Snapshotindex . . . c1 v ? ? ? ? w nil nil . . . ci v ? ? ? ? w nil nil . . . cc v ? ? ? ? w nil nil t
Bounded Memory: Cyclical Buffers Needed buffer length is dependent on how fast the updaters is compared to the scanner Each component can have different buffer lengths Snapshot
Bounding Needed buffer length for component k Can be refined even further Timing Information where Ts is the period for the snapshot task Tw is the period for the writer tasks
Using a Sun Enterprise 10000 multiprocessor computer 1 scanner task and 10 updater tasks, one on each CPU Comparing two wait-free snapshot algorithms Using timing information Using Test-and-Set synchronization Experiments
Scenarios with different ratios between scanner/updater: Measuring response time for scan versus update operations Experiments
Scan operation - Average Response Time Experiments
Update operation – Average Response Time Experiments
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Shared Register Software engineering part Conclusions & Future Work Schedule
Target domain: Shared Memory (Even no cache coherency) Wait-Free Atomic Shared Buffer by Vitanyi et. al A Matrix of 1-reader 1-writer registers Each register contains a value/tag pair encoded as one value Shared Register Readers R11 R12 ... R21 R22 … Rij • written by processor i • read by processor j ... ... ... tag value Writers
Algorithm: Readers scans its column for highest tag and returns the corresponding value Writers scan its column and writes the next tag together with the new value to its row Unbounded maximum size for the tag field in the value/tag pair Assume 8 writer tasks with 10 ms period Maximum tag after one hour is 2880000 which needs 22 bits! Shared Register
Analyzing the maximum difference between tags possible observable by a task at two consecutive invocations of the algorithm In any possible execution: Tmax is the longest period Rmax is the longest response time Twr is the period of the writer tasks Recycling tags: Newer tags can restart from zero when we reach a certain tag value In order to be able to decide if newer tags are newer we need to have: Timing Information v3 v4 v1 v2 v3 v4 0 N
Example Task Scenario on 8 processors: Unbounded algorithm would have reached tag 68400 in one hour , needing >16 bits Examples
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Register Software engineering part Conclusions & Future Work Schedule
Multithreaded programming needs communication. Communicating using shared data structures like stacks, queues, lists and so on. This needs synchronization! Locks (Mutual exclusion) has several drawbacks, especially for Real-Time Systems. Non-blocking solutions are often complex to implement and have non-standard interfaces. Background
Designed with the following properties: Functionality – Stacks, Queues, Lists, Snapshot, Register… with clear specifications Programmer friendly - #include <noble.h> , NBL<function> Easy to adapt existing solutions – Provides locks as well as non-blocking synchronization NOBLE: A Non-Blocking Inter-Process Communication Library
Designed with the following properties (cont.): Efficient – Object oriented design “virtual functions and inheritance with base classes” in C Portable – Modular design, platform-dependent code separated Adaptable for different programming languages – C, C++, Standard dynamic linked library NOBLE: A Non-Blocking Inter-Process Communication Library
#include <noble.h> First create a global variable handling the shared data object, for example a stack:NBLStack *stack;stack=NBLCreateStackLF(10000); When some thread wants to do some operation:NBLStackPush(stack, item);oritem=NBLStackPop(stack); Examples
When the data structure is not in use anymore:NBLStackFree(stack); To change the synchronization mechanism, only one line of code has to be changed!stack=NBLStackCreateLF(10000);replaced withstack=NBLStackCreateLB(); Examples
Set of 50000 random operations performed multithreaded on each data structure, with either low or high contention. Comparing the different synchronization mechanisms and implementations available. Varying number of threads from 1 – 30. Performed on multiprocessors: Sun Enterprise 10000 with 64 CPUs, Solaris Compaq PC with 2 CPUs, Win32 Experiment
Multiprocessor support Sun Solaris (Sparc) Win32 (Intel x86) SGI (Mips) – Evaluation stage Linux (Intel x86) – Evaluation stage Extensive Manual Web site up and running, http://www.cs.chalmers.se/~noble Status
Introduction Real-Time Systems Synchronization Shared Data Objects: Snapshots Evaluation The Effect of Using Timing Information Snapshot Register Software engineering part Conclusions & Future Work Schedule