380 likes | 600 Views
COMP5338 – Advanced Data Models. Week 12: Temporal Data Models. Richard T. Snodgrass, Time-Oriented Database Applications, Chapter 2,Chapter 5. Outline. Motivation Time and Temporal Database Concept Implementation using SQL Temporal Queries.
E N D
COMP5338 – Advanced Data Models Week 12: Temporal Data Models Richard T. Snodgrass, Time-Oriented Database Applications, Chapter 2,Chapter 5
Outline • Motivation • Time and Temporal Database Concept • Implementation using SQL • Temporal Queries Material adapted from presentations by A. Fekete, U. Roehm, and S. Chawla COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Introduction • Most databases are examples of “snapshot” databases. • If a record is updated then the previous state of the record (and therefore the database) is lost. • However, this is a problem especially with the current use of databases as a backend for web pages. Why? • Conventional database systems do not naturally lend themselves to support the current, past and even the future state of reality. • A temporal database is a system which can manage time varying data. • Note, most database systems support the “time” data type. However the interpretation of this data type (in a record) is user-defined (e.g., birth day). COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Motivating Example • ‘University Information System’ Employees ( tfn, last_name,first_name,bdate,salary) Incumbents( tfn, pcn ) Positions ( pcn, job_title_code1, … ) JobTitles ( job_title_code, job_title ) • This allows to answer simple questions on the current University state such as • What is Bob’s salary? • What is Bob’s position? • What is Bob’s date of birth? • Note:implicit meaning includes ‘current’ (current salary, current position) COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Motivating Example (cont’d) • We want to be able to answer questions about the history • when which employees were assigned to what positions. • e.g. Which positions did Bob have after the year 2000? or Who were the HoS of the SIT while Alice was professor there? • How to integrate a tracking log that captures changes on the tables? • E.g. When was is recorded that Bob is financial officer? • Or: Give a change history of the Head-of-School position. • Is simply adding some DATE columns sufficient? • What effect does it have on querying, indexing, constraints etc? COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Outline • Motivation • Time and Temporal Database Concept • Implementation using SQL • Temporal Queries COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
The Concepts of Time • Temporal data types • Instant something happening at an instant of time • Interval a length of time; a duration • Period an anchored duration of time (the 10 days from Jan 1, 2012 to Jan 10, 2012 inclusive) • Kinds of time • User-defined timean uninterpreted time value • Valid time when a fact was true in the modelled reality • Transaction time when a fact was stored in the database COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Time Instant • When something happened is described by a time instant • Eg Joe’s salary was set to 75000 on 1 January 2008 • Granularity: while physics suggests time as a continuous variable, for db purposes we think of it as a succession of ticks • Many business rules treat each day as an indivisible unit • For other matters, we may “tick” once a second, or once a millisecond • With second granularity, a time instant might be “03:01:24 on 5 March 2010” COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Complications • Inconsistent formats • “3/5/10” • In USA, this is understood as March 5, 2010 • In Australia/UK, this means “3 May, 2010” • Y2K issue: is a two-digit year “30” to mean 2030 or 1930 (or even 30 C.E.!) • Time Zones • The same instant is given different hour (even day) at different places in the world • Even at a single place, the same hour can occur twice when daylight savings ends • Maybe use UTC (Universal Time, often called GMT) Or record the time zone as part of the instant COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Interval • Interval is a measure of the separation between two instants • This is important to answer requests like “for how long was Joe a professor?” • Also, it is very common to shift an instant forwards or back by a given amount • Find the date that is three months after the date on which Joe’s salary increased COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Temporal data type support in SQL • SQL:2011 supports time instants, intervals and periods) • Instant data types: • DATE • SQL-92: day, month and year of a time instant (from year 1 to 9999) • Oracle: date + time with resolution of 1 second (1.1.4712 B.C.E. - 31.12.4712 C.E.) • TIMESTAMP • SQL-92: date + time with variable resolution of fractions of a second (default: 1ms); also available: TIMESTAMP WITH TIMEZONE • TIME • SQL-92: hours, minutes, seconds and optional fractional digits of second • not really a time instant (no date!); not supported by e.g. Oracle • Interval data types: • Year-Month Intervals (INTERVAL YEAR TO MONTH) • Day-Time Intervals (INTERVAL DAY TO SECOND) • Many DBMS only support time instants, but no intervals • Must hence be simulated with two time instants (start + end) COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Kinds of Time • User-defined time • According to Snodgrass as ‘an uninterpreted time interval’ • E.g. a birthdate • Valid Time & Transaction Time • Cf. following examples • A table can be associated with none, one, two or all three kinds of time COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Temporal Databases • A temporal database stores information about history • Current information, and also what was true earlier • Allows queries that are interested in the past, or in how we got to • Where we are now! • Lots of research in late 1980s and early 1990s, but this did not get adopted by common dbms platforms • So users need to model the temporal ideas with standard SQL schemas, queries etc • Some supports in commercial dbms • DB2’s Time Travel Queries • Oracle’s Flashback queries COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Two Interpretations of Time • Suppose we say “Joe was Head of School during 2009” • Now, there are two notions of time that might be involved here • During 2009, Joe was actually Head • During 2009, the database knew that Joe was Head • These are usually but not always the same • Sometimes a fact becomes true, but the database isn’t updated immediately • Sometimes errors are entered into the database, and only corrected after a while • Different use-cases need different aspects COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Transaction Time and Valid Time • Transaction time records the history of database activity. • Only moves forward (as you cannot go back in history and change things –alas!) • Useful for exploration of how the system behaved in previous state (very important for auditing) • Valid time records the time when the fact is true in the real world. • Can move forward and backward (corrected as we learn better) • The release date of certain movie might be pushed back or forward • Useful for exploring the history of the domain COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Valid Time Database • A valid-time temporal database stores the history of the domain (as we understand it now) • The position, salaries of employees associated with valid time period • Useful to answer questions about the past reality • Who was Head of School in 2009? • When was Alice Head? • Eg db schema is Positions(Person, Role, ValidPeriod); db contains data • Alice, Head, period(2008) • Joe, Ugrad Director, period(2008) • Joe, Head, period(2009) • Alice, Ugrad director, period(2009-2010) • Phil, Head, period(2010) • Even store facts for the future (as we currently expect it to occur) • Mary, Head, period(2011-2012) Through the year 2008, Alice had the position of Head COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Transaction Time Database • Transaction time databases retain and provide access to prior states of a database. • Many NoSQL systems keeps versioned data • E.g. HBase • Support of Transaction Time can be found in newer versions of commercial RDBMS • Oracle Flashback Data Archive • DB2 manages system time using a separate history table • In these automated systems, transaction time does not have to be reflected explicitly in the schema, nor does the user have to explicitly inform the system of start and finish times • The platform can manage this directly COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Transaction Time Database • Typically, though, application schema is chosen to deliberately keep track of when data was current in the db • Eg db schema is Position(Person,Role,TransactionPeriod);db contains data • Alice, Head, period(2008) • Joe, Ugrad Director, period(2008) • Joe, Head, period(2009) • Alice, Ugrad director, period(2009-2010) • Phil, Head, period(2010) Through the year 2008, the db kept information that Alice was in the position of Head COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Bitemporal Database • Provide support for understanding the previous states of valid time data • Eg db schema is Positions(Person, Role, ValidPeriod,TransactionPeriod); db contains data • Alice, Head, period(2008 till completed), period(2008) • Alice, Head, period(2008), period(2009 till now) • Joe, Ugrad Director, period(2008 till completed), period(2008) • Joe, Ugrad Director, period(2008), period(2009) • Joe, Head, period(2009 till completed), period(2009) • Joe, Head, period(2009), period(2010 till now) Through the year 2008, the db kept information that Alice had the position of Head for 2008 and the future From 2009 onwards, the db kept information that Alice had been Head for 2008 (and completed that year) COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Outline • Motivation • Time Temporal Database Concept • Implementation using SQL • Temporal Queries COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Schema • Valid time, transaction time and bitemporal database can be implemented by adding columns to capture the endpoints of the periods Valid period Transaction period COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Recording Temporal Data • Alice has been the UG director since Jan, 1st 2008, this was recorded in the database on Jan, 1st 2008 • Joe was appointed head of school for the period 2009-2010, the fact was recorded in the database on Oct 1st, 2008 • Joe’s head of school role has been extended to 2011, this was recorded in the database on Nov 1st, 2010 • Alice was asked to take the Head of school role from 2012, this was recorded in the database on Oct, 1st, 2011. Alice UG Director 2008-01-01 Now 2008-01-01 2011-10-01 UC Joe Head 2009-01-01 2010-12-31 2008-10-01 2010-11-01 UC Joe Head 2009-01-01 2011-12-31 2010-11-01 UC Alice UG Director 2008-01-01 2011-12-31 2011-10-01 UC Alice Head 2012-01-01 Now 2011-10-01 UC UC = Until Changed COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Issues • Temporal Key • “Unfinished” period, or period with unknown end • E.g. Now, UC, .. COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Temporal Keys • The value of the primary key of a table must be unique • In the “snashot” version of Positions table, we might have primary key Person, indicating a person cannot have the same role twice at any point of time √ X • This becomes complicated when we add time related columns • A person may have the same role twice at different periods of time • The original key (Person,Role) is not, by itself, a primary key of the temporal table… √ COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
valid periods overlap.. Temporal Keys – Simple Solution • In Valid Time database, add Vs, or Ve or both to the primary key, Example: • Positions( Person, Role, Vs, Ve) • In Transaction Time database, add Ts, or Te or both to the primary key • In Bitemporal database, add Vs, and Ts to the primary key • However, does not solve problems with overlapping periods • Example: COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Sequenced Constraints • Adding Vs, or Ve or both to the primary key did not prevent an person from having a position twice during a time period… • Solution: additional sequenced primary key constraint CREATE ASSERTION seq_primary_key CHECK ( NOT EXISTS ( SELECT * FROM Positions p1 WHERE 1<(SELECT COUNT(Person) FROM Positions p2 WHERE p1.Person = p2.Role AND p1.Person = p2.Role AND p1.Vs <p2.Ve AND p2.Vs <p1.Ve)) p1.Vs p1.Ve p2.Vs p2.Ve p2.Vs p2.Ve COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Referential Integrity • Neither table is temporal • Normal SQL’s foreign key construct • Only the referencing table is temporal • The same as above • If the referenced table or both table are temporal • The sequenced referential integrity constraint becomes very complex in SQL (cf. chapter 5 of Snodgrass book) • It is possible to implement such constraints at code level, instead of SQL level COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Now-Relative Tuples • Vsand Ve represent the starting and ending points of fact validity. • we use closed-open time periods: [Vs, Ve) • Ts represents the time when the tuple is inserted into the database, Te represent the time when the tuple is logically deleted. • Database supporting transaction time should be is append only • A delete occurs by modifying the transaction time. These are called “logical deletes” • A tuple is considered current when Te = now/UC • Such a tuple, where the validity of the fact is valid up-to the current time is called now-relative Alice UG Director 2008-01-01 Now 2008-01-01 2011-10-01 UC Joe Head 2009-01-01 2010-12-31 2008-10-01 2010-11-01 UC Joe Head 2009-01-01 2011-12-31 2010-11-01 UC Alice UG Director 2008-01-01 2011-12-31 2011-10-01 UC Alice Head 2012-01-01 Now 2011-10-01 UC COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Handling NOW • What should the timestamp be for current data? • Four possibilities • Minimum possible timestamp (min-timestamp approach) • E.g. ‘00-00-0000’ - quite counter-intuitive • NULL value • Complicates queries: • Comparisons with NULL return FALSE in SQL… • Use of NULL as ‘unknown’ is no longer possible… • Maximum possible timestamp (max-timestamp approach) • E.g. ‘31-12-9999’ • Simulates ‘the end of time’ or ‘forever’ • Point representation • Recent approach; sets start_time = end_time • Max-timestamp approach is extensively used COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Bitemporal Table in MAX representation nowand UC = max timestamp (31.12.9999) Alice UG Director 2008-01-01 9999-12-31 2008-01-01 2011-10-01 UC Joe Head 2009-01-01 2010-12-31 2008-10-01 2010-11-01 UC Joe Head 2009-01-01 2011-12-31 2010-11-01 9999-12-31 Alice UG Director 2008-01-01 2011-12-31 2011-10-01 9999-12-31 Alice Head 2012-01-01 9999-12-31 2011-10-01 9999-12-31 COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Another sample bitemporal table Problem domain: the check out record of a video rental company Which tuples have been logically deleted? What tuples are current? COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Outline • Motivation • Time and Temporal Database Concept • Implementation using SQL • Temporal Queries COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Current Queries • Ask about the current state • Valid time period includes NOW • Transaction time period includes NOW • Coding these depends on the representation • In Oracle: use pseudo constant expressions SYSTIMESTAMP, SYSDATE etc to find out the current time instant • Eg (with [closed, open) and MAX for current SELECT <target list> FROM <temporal tables> WHERE <other conditions> AND SYSDATE < v_end COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
Time Travel Queries • Time Travel Query (also called timeslice, flashback, etc) • Allows to explicitly investigate the past state • Query “as of” a timestamp • Eg “who was Head on June 30, 2008?” • Distinguish carefully: valid time as-of, or transaction time as-of • If we have a bitemporal database, we may want best (current) information about the real state of the domain in the past • We must look for rows whose appropriate period includes the given date/time • Coding this depends on the representation • Eg With [closed, open) and MAX in a valid-time table SELECT * FROM Positions WHERE Position=‘Head’ AND Vs <= ‘30-JUN-2008’ AND ‘30-JUN-2008’ < Ve COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)
References • Richard T. Snodgrass, Time-Oriented Database Applications • Chapter 2,Chapter 5 Chapter 5 • PDF downloadable from Snodgrass’ homepage • Christian S. Jensen, Richard T. Snodgrass, Temporal Data Management, IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 1, 1999 COMP5338 "Advanced Data Models" - 2014 (Y. Zhou)