Time in Databases

Time in Databases CSCI 6442 With thanks to Richard Snodgrass, 1985 ACM 0-89791-160-1/85/005/0236

Agenda • Taxonomy of time in databases • Point-in-time databases

The Problem • Suppose you are away from your company for 6 months assigned to work at an outpost in Australia, in the opal mines in Coober Peedy, underground, out of communication. • You are eligible for a raise in the midst of this. When you return they notice this. • Now it is approved and entered into the database within a few days, but you get the extra money from 3 months ago • Which times get recorded in the database?

Snodgrass • Wrote “A Taxonomy of Time in Databases” (1985) • Regarded as the authoritative work • Defines three types of time

Types of Time • There are three times: • Transaction time—when the change is made to the database • Valid time—when the change was approved • User-defined time—when the raise is effective (application-specific)

Static Database • Database that reflects changing values with a snapshot at any particular time • Does not necessarily reflect the state of the real world • We discard past states of the database • Can’t answer questions about what past values were • Can’t show trends in change over time

Static Rollback Database • Adds some time sequence • Ability to back out transactions is provided • Can back out one at a time to previous states of the database • Such a database shows the history of transactions rather than the history of the actual data

What’s The Right Approach? • We might like to look at previous states of the database • We might be more interested in the evolution of the data over time rather than the transactions • In addition, recovery mechanisms provide us the ability to back out any transactions as desired

Point In Time Databases* • A single database models the present and all past states of the data as it matters to the organization • Enough information about past states is stored so that the past state can be obtained using a SQL query • Instead of deleting rows they are marked as no longer valid • When values are updated, the old values are kept as well • This was impractical in 1970 because of cost and efficiency issues • Is it practical today? * Not Snodgrass’s terminology

What We Store • With each change to a row, we insert a new row, instead of changing the previous row • Consider EMP(EMPNO, ENAME, SAL) • We add the attributes START and END • EMP(EMPNO, ENAME, SAL, START, END) • New attributes are start and end times for validity • When we change a row, that time is the END time for validity of the previous row, START time of validity for the new row • These times are usually what Snodgrass calls “user-defined” time because that’s the time that matters to the organization • His “valid time” is just a date of transaction approval • Transaction time is when entered into the database • Let’s call this time “effective time” because it’s the time when the row’s values are effective

Point In Time Operations • Insert: • New row is added • Start time is set • Stop time is left null • Delete: • Stop time is set • Update: • New row added with new values, start time • Old row is kept and stop time is set • Read: • Current values: STOP ISNULL • As of values: ASOF >= START AND (ASOF < STOP OR STOP ISNULL)

Example: Retroactive Pay Increase • Chen is given a salary increase • The salary increase is approved at date t1 • The increase is effective at date t2. • However, the increase does not get entered into the database until some other time, t3>t2 (called a retroactive salary increase). • In this case, t1 is an attribute of the approval process, t2 is the start of effective time and t3is the transaction time. • Most important to the organization is the time when Chen is getting the higher pay

An Example: EMP Schema Definition: EMP(EMPNO,ENAME,JOB,SAL,DEPTNO) Now we add START and STOP for validity dates: EMP(EMPNO,ENAME,JOB,SAL,DEPTNO,START,STOP) • START gives the first time of validity for the values in the row • STOP gives the last time of validity for the values in the row • A row with no value for STOP is the current value

Example To find the status of EMP on 1/1/2008 SELECT * FROM EMP WHERE START <= to_date( ‘01-JAN-2008','dd-mmm-yyyy') “1/1/2008” AND (STOP >= to_date( '01-JAN-2008','dd-mmm-yyyy') OR STOP ISNULL); DELETE statements will now be UPDATEs instead

Complexities • Not every column value needs to be tracked • Can simplify by not tracking some column values • This is a semantic issue • Change in primary key • If primary key changes, hard to relate changed row to original row • Potential case for use of generated key • Cascading changes • Other tables may change because of one change • Use trigger instead of automatic foreign key update • Cascading deletes • Other tables may have rows deleted because of one change • Use trigger instead of automatic foreign key delete

Capabilities • These techniques allow you to implement a database that can show its state as of any time in the past • This can be done while the database is actively being used and even updated by multiple processes • One process can be looking at the database as of two years ago while another is doing current processing • You can record retroactive changes • You can implement applications to process retroactive changes

Bottom Line • This is a practical approach that potentially simplifies data models and processing • One table replaces duplication of data and completely different functions • One somewhat more complex function replaces two (or more) separate functions • Use can be presented a unifying interface that simplifies use of the application • It’s a tradeoff—sometimes use a separate history table

Time in Databases

Time in Databases

Presentation Transcript

Databases in Bioinformatics

High Availability and Fault-Tolerance in Real-Time Databases

Real-Time Databases

Normalization in Databases

Fast Subsequence Matching in Time-Series Databases

In-memory Databases

Introducing Real-Time Databases

Real-Time Databases and Data Services

Survey of Real Time Databases

Misconceptions About Real-Time Databases

Advanced Databases; Current Trends in Databases

Mining Time-Series Databases

Databases in Bioinformatics

Indexing transaction time databases

Real-Time Distributed Databases

Databases in DAFF

PhD in Databases

Databases in Context

Subsequence Matching in Time Series Databases

Misconceptions About Real-Time Databases

Databases in BABAR

Fast Subsequence Matching in Time-Series Databases