460 likes | 656 Views
Real-Time Databases. Krithi Ramamritham, “Real-Time Databases,” International Journal of Distributed and Parallel Databases, 1(2), pp. 199-226, 1993. J. Stankovic, S. H. Son, and J. Hansson, " Misconceptions About Real-Time Databases," IEEE Computer, vol. 32, no. 6, pp 29-36, June 1999. .
E N D
Real-Time Databases Krithi Ramamritham, “Real-Time Databases,” International Journal of Distributed and Parallel Databases, 1(2), pp. 199-226, 1993. J. Stankovic, S. H. Son, and J. Hansson, " Misconceptions About Real-Time Databases," IEEE Computer, vol. 32, no. 6, pp 29-36, June 1999.
Outline • Motivation • Characteristics of data in RTDB • Characteristics of transactions in RTDB • Relations between active DB and RTDB • Transaction processing in RTDB • Research issues
Motivation • Many applications involve: • time-constrained access to data • data temporal validity • examples: agile manufacturing, stock trading, e-commerce, command and control, network management, target tracking, ... • Requirements • Timely transaction/query processing • Use fresh, i.e., temporally consistent, data
Traditional Databases • (Traditional) DBs • Deal with persistent data • Transactions access (persistent) data, while maintaining the consistency • Serializability is the correctness criterion • Support a good throughput and response time
Background: Serializability • Correct criterion for concurrent transaction executions • Why concurrent transactions? • Better performance than serial executions • Deifinition • A concurrent execution of transactions is equivalent to a serial execution of the transacions • A correct concurrent execution of the transactions produces the same result as they are executed one at a time
Background: Conflict Serializability • Two operations conflict if: • they are issued by two transactions; • they access the same data; and • at least one of them is a write • Two transaction schedules are conflict-equivalent if all conflicting operations are in the same order in the two schedules • A concurrent schedule is conflict-serializable if it is conflict-equivalent to a serial schedule
Background: Conflict Graph • Conflict graph • Nodes: transactions • Directed Edges: conflicts • Example: schedule S = w1(x)r2(x)r3(y)w2(z)r3(z)w3(y) r1(y) • A schedule is conflict serializable if there’s no cycle in the conflict graph T1 T3 T2
Background: Concurrency Control - Locking • A transaction should get a lock on a data before accessing it • Shared lock: More than one transaction can get a shared lock on a data at the same time • Exclusive lock: Only one transaction can get an exclusive lock on a data at a time • If a data has a shared lock, other transactions can get a shared lock to read the data • If a data is already locked through either a shared or exclusive lock, another transaction cannot get an exclusive lock on the same data -> It has to block • This simple mechanism doesn’t necessarily support conflict-serializability
Background: 2PL (Two Phase Locking) for Conflict Serializability • A transaction execution can be divided into two phases: • Growing phase: The transaction can only acquire locks • Shrinking phase: It can only release locks • Strict 2PL: Hold an exclusive lock until the transaction committs #locks
RT systems • Meet timing constraints • Deal with temporal data that become outdated after a certain time • Recall real-time ≠ fast: See the next slide
Real-time ≠ Fast Time-cognizant transaction scheduling & concurrency control required!
Why RTDB? • RT applications may deal with many data, e.g., for target tracking, agile manufacturing, stock trading, ... • DB can facilitate: • description of data – schemas help avoid redundancy of data • maintenance of correctness & integrity of data • efficient access to data - indexing • correct execution of transactions in spite of concurrency and failures – ACID properties (Atomicity, Consistency, Isolation, Durability)
RTDB Features • Not all data are permanent but temporal, e.g., sensor data or stock prices • Temporally-correct serializable schedules are a subset of serializable schedules • Timeliness is more important than correctness • Tradeoff btwn timeliness & serializability • Tradeoff btwn timeliness & atomicity • Motononic queries and transactions supported by the milestone approach • Tradeoff btwn timeliness & data temporal consistency • Data similarity concept • Adaptive update policy • Both real-time scheduling & database technologies can be applied to real-time data management
Data Characterics in RTDB • Temporal data consistency: Keep track of the real world status • Absolute consistency btwn the state of the environment, e.g., manufacturing or market status, and its reflection in databases • Relative consistency among the temporal data used to derive other data • Relative consistency of stock price data used to derive SP500 index
Absolute consistency • Denote a temporal data item in RTDB by d: (value, avi, timestamp) • dvalue denotes the current value of d • dtimestamp denotes the time when the d was updated • davi denotes d’s absolute validity interval, i.e., length of time interval following dtimestamp during which d is considered to have absolute validity • d is absolutely consisntent if current time ≤ dtimestamp + avi
Relative Consistency • Relative consistency set R: a set of data used to derive a new data • Each set R is associated with a relative validity interval (rvi) • Example: • SP500 index is an average of 500 stock prices • Target position can be computed using, e.g., aircraft heading, air speed, wind speed & direction, barometric pressure, ...
Relative Consistency • Assume a data d in R (relative consistency set) • d has a correct state iff • dvalue is logically consistent – satisfy all integrity constraints • d is temporally consistent • absolute consistency: (current time – dtimestamp) ≤ davi • relative consistency: For arbitrary d’ in R, |dtimestamp – d’timestamp| ≤ Rrvi
Relative Consistency • Examples • temperatureavi = 5, pressureavi = 10, R = {temperature, pressure}, Rrvi = 2 • If current time = 100, • temperature = {347, 5, 95} ({value, avi, timestamp}) & pressure = {50, 10, 97} are temporally cosistent • temperature = {347, 5, 95} & pressure = {50, 10, 92} are not because (95-92) > Rrvi = 2, although temperature and pressure meet the absolute consistency requirements
Relative consistency • At time 100, temperature = {347, 5, 95} & pressure = {50, 10, 92} are not temporally consistent because (95-92) > Rrvi = 2, although temperature and pressure meet the absolute consistency requirements • Is this good? • Users may expect relative consistency is satisfied if the abosolute consistency of all the data in R is met! • avi of pressure should be reduced to 5 to meet the required rvi of 2 and the updates of pressure and temperateure should always be done within 2 time units • A better metric is required! But, not much work has been done to address this issue!
Transaction characteristics in RTDB • Transaction types • Write-only transactions obtain the real-world status and write into RTDB (also called sensor transactions) • Update transactions derive and store new data in RTDB (also called derived data recomputations) • Read only transactions, i.e., queries • Read sensor data and compute actuation signals • User transactions that read temporal data and read/write non-temporal data
Transaction characteristics in RTDB • Example transactions • Sample wind velocity every 10s • Update robot positions every 20s • If temperature > 100, add coolant to reactor in 10s • If the average stock price of a user portfolio changes by more than 10%, sell the stocks within 5s
Transaction characteristics in RTDB • Deadlines • Hard: Negative infinte value upon a deadline miss • Soft: Value decreases as time goes on after the deaadline • Firm: No value after the deadline miss
Transaction characteristics in RTDB • How often do we need to execute a sensor transaction to update data x? • Period = 0.5 * avi(x): Half-half principle If period = avi: avi x is stale If period = 0.5avi: avi avi x is fresh as long as the sensor transaction finishes within the period
Transaction characteristics in RTDB • How often do we need to recompute a derived data? • More complex • Ideally, a derived data should be fresh if recomputed at every rvi • Alternatively impose precedence constraints on the transactions to confirm with the derived-from relationship
Relationship to Active Databases • Basic building block in active DB: Event, Condition & Action (ECA) • On event If condition Do Action • Upon the occurence of the specified event, if the condition holds, then trigger the specified action • Good model for triggering periodic/aperiodic activities based on the events and conditions • Timing constraints are not explictly considered
Relationship to Active Databases • Active DB has necessary features for real-time data management • Timeing constraints should be considered • Example On (10 seconds after “initiating landing preparations” If (steps are not completed) Do (within 5 seconds “abort landing”)
Transaction Processing in RTDB • Key issue: predictability • Will the transaction meet its timing constraint? • Sources of unpredictability • Processing hard real-time transactions • Processing soft real-time transactions
Sources of unpredictability in DB • Dependence of transaction exec sequence on data values • Very hard to predict the worst case exec time • Avoid to use unbounded loops, recursive or dynamically constructed data structures • In RTDB, the data items accessed by a transaction are likely to be known once its functionality in the controlled environment is known
Sources of unpredictability in DB • Data & resource conflicts • Wait for data and resources, e.g., CPU & I/O device • Data consistency requirements exacerbate the problem • Long blocking due to concurrency control • Priority inversion • Deadlock – 2PL is not free of deadlock
Sources of unpredictability in DB • Dynamic paging & I/O • Demand paging in disk-resident databases • Very pessimistic worst case where all data need to be fetched from disk • Disk scheduling & buffering • Main memory databases eliminate these problems
Aborts, rollbacks, and restarts • Transaction aborts, rollbacks, andrestarts • A transaction can be aborted and restarted several times before it commits • Total exec time increases. If #total aborts cannot be controlled, it can be unbounded • Resources & time needed to deal with aborts & restarts can be denied to other transactions
Preanalysis of transactions • Get an estimate of a transaction’s exec time & data/resource requirements • Impossible for complex transactions • Two-phase transaction exec • Pre-fetch phase • A transaction is run once, bringing in the necessary data into main memory • Access invariance [15]: A transaction’s exec path does not change due to possible concurrent changes done to the data by other transactions, while the transaction is going through its pre-fetch phase • No writes are performed • Conflicts with other transactions are not considered • Determine computation demands
Preanalysis of transactions • Two-phase transaction exec • Try to guarantee the transaction will commit by its deadline in the 2nd phase • Ensure the necessary data & processing resources are available at the appropriate times via planning • If access invariance holds, a transaction will complete by its deadline • No recovery such as undo is necesary if a transaction is unable to execute • How much overhead?? Worth it?
Dealing with Hard Deadlines • Must meet all deadlines • Requirements • Transactions should be periodic • WCET & resource requirements must be determined • Many restrictions on the structure & characteristic of RT transactions -> RT scheduling techniques can be applied
Dealing with Soft Deadlines • More leeway • Most DB applications are not hard but soft real-time • Meet as many deadlines as possible • Abort a transaction upon its deadline miss • Don’t waste resources for tardy transactions • Always good? Different application semantics? • Real-time scheduling and conflict resolution are required
Scheduling • EDF • Least slack first • Schedule the transaction with the least slack (i.e., deadline – current time– remaining exec. time) first • High overhead • Priority changes very often • Highest value first • Highest value density (value/exec time) • How to determine value??? • Longest executed transaction first
Conflict resolution: 2PL variations • Priority inheritance • If a high priority is blocked due to a low priority transaction, a low priority transaction inherits the high priority • Reduces blocking time; however, • Blocking time = Duration of a transaction under strict 2PL • Priority abort • A high priority transaction aborts a low priority transaction upon a data conflict • Better real-time performance than priority inheritance • 2PL-PA/2PL-HP well accepted in RTDB • Low priority transactions may suffer repeated aborts and restarts, which can be a problem in, e.g., e-commerce
Conflict resolution: Optimistic concurrency control • Assume there’s not data conflict during a transaction execution • Keep executing a transaction • Upon finishing every operation in a transaction, enter the validation phase • If validation succeeds, the transaction commits • Otherwise, it is aborted
Conflict resolution: Optimistic concurrency control • Backward validation • A validating transaction is aborted if it conflicts with transactions already committed • Characteristics of a validating or ongoing transactions cannot be considerd for conflict resolution • Forward validation • A validating transaction aborts ongoing transactions if there’s a conflict • More applicable to RTDB • Wait-50: A validating transaction blocks as long as more than half the transactions that conflict with it have earlier deadlines
Distributed RTDB • Very little work has been done • Challenges • Transaction commitment protocol, e.g., 2PC (Two Phase Commit), has high overhead • Unpredictable network delay • Opportunities • Data & resource availability at remote nodes • Load balancing • Fault/intrusion tolerance
Two Phase Commit (2PC) Protocol • Supports the integrity in distributed databases used in, e.g., airline reservation, banking, and stock trading • All participating databases must either commit or abort and rollback • Prepare phase: Each database informs the coordinate whether it will commit or abort a transaction • Commit phase: Commit if every database intends to commit; otherwise, abort & rollback • Drawback • If only one database is unavailable, all the other databases cannot commit • Too much overhead for real-time applications • Better approaches are required!
QoS Tradeoff & Overload Management • APPROXIMATE • Montonically increase the accuracy of the answer to a query as more exec time is spent • Provide an approximate answer, if necessary, to meet the deadline • Epsilon serizability • Allow transactions to read data while concurrent writes are going on • Bound the error to be below the specified epsilon • Timeliness & security tradeoff • Apply a weaker security mechanism under overload
Research issues • QoS guarantees in RTDB • Transaction timeliness & data freshness • Distributed real-time data management • Security • Access control for RTDB? • New applications • e-commerce: QoS guarantees given dynamic workloads • Embedded applciations: Timeliness, data temporal consistency, energy-efficiency, composability, security, real-time data-centric routing and sensor data aggregation, ...