490 likes | 678 Views
Data Models and Query Languages of Spatio-Temporal Information. Cindy Xinmin Chen Computer Science Department UCLA February 28, 2001. The Problem. Data models and query languages for spatio-temporal databases: many different approaches proposed complexity of technical problem
E N D
Data Models and Query Languages of Spatio-Temporal Information Cindy Xinmin Chen Computer Science Department UCLA February 28, 2001
The Problem • Data models and query languages for spatio-temporal databases: • many different approaches proposed • complexity of technical problem • diversity of application requirements • Implementation: extensions for spatio-temporal information • zero extensibility in Relational DBMS • Object-Relational systems are better, but still have many limitations
Contribution of This Research • Better data models and query languages for temporal and spatio-temporal information • Multi-layered architecture for spatio-temporal extensions on O-R systems • Support further extensions and customization by end-users via user-defined spatio-temporal aggregates
Outline • Temporal Data Models and Query Languages -- SQLT • Spatio-Temporal Data Models and Query Languages -- SQLST • Implementation of SQLST • More Abstract Representation of Spatio-Temporal Data • Conclusion
State of Art • More than 40 temporal data models according to [Jensen and Snodgrass 99] • Interval-based approach [Lorentzos 97] • same conceptual level and implementation level representations but requires interval coalescing after projection • TSQL2’s implicit time [Snodgrass 95] • temporal joins are specified without ever mentioning the time column in WHERE or SELECT clauses of the query • Point-based approach [Toman 98]
Interval-Based Time Model and Coalescing • A temporal relation contains prescription information • Projection on Name and Physican: • Projection on Name and Drug: Prescription(Melanie, Dr. Jones, Proventil, 3mg, [19960101, 19960131]) Prescription(Melanie, Dr. Jones, Prozac, 3mg, [19960201, 19960229]) Prescription(Melanie, Dr. Bond, Prozac, 3mg, [19960301, 19960331]) (Melanie, Dr. Jones, [19960101, 19960229]) (Melanie, Dr. Bond, [19960301, 19960331]) (Melanie, Proventil, [19960101, 19960131]) (Melanie, Prozac, [19960201, 19960331])
TSQL2 • Bitemporal Conceptual Data Model -- coalesced data model • Two dimensional time -- valid time and transaction time • Implicit time model -- no coalescing • Lack of universality
TSQL2 -- An Example • Schema Definition • Query 1: find the drugs Melanie took in 1996 and the time she took them. CREATE TABLE Prescription (Name CHAR(30), Physician CHAR(30), Drug CHAR(30), Dosage CHAR(30)) AS VALID STATE DAY SELECT Drug VALID INTERSECT(VALID(Prescription), PERIOD(‘[1996]’ ASDAY)) FROM Prescription WHERE Name = “Melanie”
Point-Based Model • Expressive power [Toman 97] • Use user-defined aggregates to express Allen's interval operators • Universality: • uniformly applicable to SQL, QBE and Datalog • use current query languages’ construct types • no new constructs are introduced
SQLT: Schema Definition • Define the Prescription relation CREATE TABLE Prescription (Name CHAR(30), Physician CHAR(30), Drug CHAR(30), Dosage CHAR(30), VTime DATE)
Temporal Selection and Join • Query 1’: find the drugs Melanie took in 1996 and the time she took them. SELECT Drug, VTime FROM Prescription WHERE Name = “Melanie” 19960101 <= VTime AND 19961231 >= VTime
Interval-Oriented Reasoning • Query 2: find the patients who have taken Proventil throughout the time they took Prozac. SELECT P1.Name FROM Prescription AS P1 P2 WHERE P1.Name = P2.Name AND P1.Drug = “Proventil” AND P2.Drug = “Prozac” GROUP BY P1.Name HAVINGDURING(P1.VTime. P2.VTime)
Interval-Oriented Reasoning (cont.) • Query 2 in QBE Prescription Name Physician Drug Dosage VTime P.G._name Proventil _vtime1 _name Prozac _vtime2 Conditions DURING(_vtime1, _vtime2)
Interval-Oriented Reasoning (cont.) • Query 2 in Datalog query2(Name, during<VTime1, VTime2>) prescription(Name, _, “Proventil”, _, VTime1), prescription(Name, _, “Prozac”, _, VTime2).
Implementation of SQLT on DB2 • From point-based representation to interval based representation • Difficulty of support temporal data model and query language extensions on existed O-R systems • only user-defined functions (UDFs) available • UDFs can not access the database tables directly • UDFs are hard to develop and debug
Spatio-Temporal Data Models and Query Languages
Previous Work • Constraint-based approach • Triangulation-based spatial objects + interval-based time [Chomicki 97] • Parametric rectangles + interval-based time [Cai 00] • Time as another dimension in space [Grumbach 98] • Composite spatio-temporal data types: mpoint and mregion [Güting 00] • Orthogonal space and time [Worboy 94]
Previous Work (cont.) • Commercial DBMSs • no spatio-temporal extensions • only spatial DataBlades, Extenders, etc. • provide a predefined library of functions • offer no extensibility
Objective of SQLST • orthogonality, minimality and extensibility • separated temporal and spatial information • minimal extensions to SQL • additional constructs can be built in SQLST
Design and Implementation of SQLST • Define a minimal set of built-in primitives in procedure language • Use user-defined aggregates for further extension • Data types: • Temporal data type -- time interval • Spatial data types -- points, lines (finite straight line segments), and counterclockwise directed triangles
P’ P Counterclockwise Directed Triangle • A triangle is counterclockwise directed if its three vertexes are counterclockwise orientated • Makes point-location problem easy • inside(point, triangle) V3 T V2 V1
Application Example • Cyclone statistics for the northern Hemisphere from NSF Arctic System Science Research Program ID Trajectory Pressure Start Time End Time x1 y1 x2 y2 960001 (1146, 1034, 1303, 1775) 1004 1996-05-01 1996-05-02 960001 (1303, 1775, 1664, 1779) 995 1996-05-02 1996-05-03 960001 (1664, 1779, 1957, 1018) 991 1996-05-03 1996-05-04 day2 day3 day1 day4
SQLST: Schema Definition • Define the Cyclone relation • Define the Island relation CREATE TABLE Cyclone (ID INT, Trajectory LINE, Pressure REAL, Tstart DATE, Tend DATE) CREATE TABLE Island (Name CHAR(30), Region TRIANGLE)
Spatio-Temporal Queries • Query 3: find all cyclones whose high pressure stage (pressure > 1000mb) have lasted more than 3 days. SELECT ID FROM Cyclone WHERE Pressure > 1000 GROUP BY ID HAVINGDURATION(Tstart, Tend) > 3
Spatio-Temporal Queries (cont.) • Query 4: find the cyclones whose trajectory have been enclosed by the island Misfortune. SELECT ID FROM Cyclone, Island WHERE Name = “Misfortune” GROUP BY ID HAVINGCONTAIN(Trajectory, Region)
ST Implementation of SQL
Approach • Define a minimal set of ADTs built in C++ • Use user-defined aggregates to define new spatio-temporal primitives • Allow end-users to extend and customize the system for their application
Built-in Spatial Functions • length(line) • area(triangle) • center_of_mass(triangle) • distance(point, point) • distance(point, line) • intersect(line, line) • intersect(line, triangle) • intersect(triangle, triangle)
User-Defined Aggregates (UDAs) • UDAs provide a more general and powerful mechanism for DB extensions • ease of use • no impedance mismatch of data types and programming paradigms • DB advantages -- scalability, data independence, optimizability, etc.
Aggregate eXtension Language (AXL) [Wang 00] • Stream orientated processing • Three functions expressed in SQL • INTIALIZE: gives an initial value to the aggregate • ITERATE: computes the intermediate aggregate value for each new record • TERMINATE: returns the final value computed for the aggregate • Local tables • state • return • Built on the Berkeley DB storage manager
Duration • Calculates the total length of the time intervals Cyclone(960001, _, _, 19960101, 19960105) Cyclone(960001, _, _, 19960111, 19960115) Cyclone(960001, _, _, 19960121, 19960125) 15 days
Duration (cont.) AGGREGATE DURATION(Tstart DATE, Tend DATE) : INT { TABLE state (i INT); INITIALIZE : { INSERT INTO state VALUES(Tend - Tstart + 1); } ITERATE : { UPDATE state SET i = i + (Tend - Tstart + 1); } TERMINATE : { INSERT INTO return SELECT i FROM state; } }
Contain • Tests if one object contains another • returns 1 if true; returns nothing otherwise contain(O1, O2) triangle t2 O2, vertex v of t2, • triangle t1 O1, • v inside t1
Contain (cont.) AGGREGATE CONTAIN(Object1 TRIANGLE, Object2 TRIANGLE) : INT { TABLE state (b INT) AS VALUES(1); TABLE triangles(Object TRIANGLE); TABLE points(Vertex POINT); INITIALIZE : ITERATE : { INSERT INTO triangles VALUES(Object1); INSERT INTO points VALUES(Object2.Vertex); } TERMINATE : { UPDATE state SET b = 0 WHERE NOT EXIST (SELECT Vertex FROM points, triangles WHEREinside(Vertex, Object) = 1); INSERT INTO return SELECT b FROM state WHERE b = 1; } }
Other UDAs • Overlap • tests if any edges of two objects intersect • Edge_Distance • calculates the minimum distance from the vertexes of one object to the edges of the other object • Moving_Distance • calculates the distance an object has traveled continuously
Key Issue: Performance • Size of data set: • Cyclone table -- 200,000 tuples • Island table -- 1000 tuples • Cases compared • AXL using indexes • AXL not using indexes • C++ using indexes • C++ not using indexes • Index • Tstart on Cyclone table and Name on Island table
Performance -- Duration • Query 5: find the duration of the cyclones occurred in June, 1996. SELECTDURATION(Tstart, Tend) FROM Cyclone WHERE 19960601 <= Tstart AND 19960630 >= Tstart GROUP BY ID
Performance – Contain • Query 6: find the cyclones which occurred in June, 1996 and have been enclosed by the region of the island Misfortune. SELECT ID FROM Cyclone, Island WHERE 19960601 <= Tstart AND 19960630 >= Tstart AND Name = “Misfortune” GROUP BY ID HAVING CONTAIN(Region, Trajectory)
More Abstract Representation of Spatio-Temporal Data
Abstract Model • Objective: flexibility • user can decide which level of abstraction they want • may have more than two layers • Data types: • temporal data type -- time instants • spatial data types – points, lines, and polygons
A Spatio-Temporal Object • The concrete model -- space triangles and time intervals • A more abstract representation • -- sequence of snapshots (S , ((2,2),(6,2),(2,6)), [1,10]) (S , ((2,6),(6,2),(6,6)), [1,10]) 1<=t<=10 6 4 2 S (S, [(2,2),(2,6),(6,6),(6,2)], 1) (S, [(2,2),(2,6),(6,6),(6,2)], 2) …… (S, [(2,2),(2,6),(6,6),(6,2)], 10) 2 4 6 8
Schema Definition • The Cyclone relation • The Island relation CREATE TABLE Cyclone (ID INT, Position POINT, Pressure REAL, Time DATE) CREATE TABLE Island (Name CHAR(30), Extent POLYGON)
Mapping • UDA -- map • Table function -- decompose (Point, Time Instant) (Line, Time Interval) (Polygon) (Triangle)
Spatio-Temporal Queries • Query 3’: find all cyclones whose high pressure stage (pressure > 1000mb) have lasted more than 3 days. SELECT NEW.ID FROM (SELECT ID, MAP(Position, Time) FROM Cyclone WHERE Pressure > 1000 GROUP BY ID) AS NEW(ID, Trajectory, Tstart, Tend) GROUP BY NEW.ID HAVINGDURATION(New.Tstart, New.Tend) > 3
Spatio-Temporal Queries (cont.) • Query 4’: find the cyclones whose trajectory have been enclosed by the island Misfortune. SELECT NEW.ID FROM (SELECT ID, MAP(Position, Time), T.Region FROM Cyclone, Island, TABLE(decompose(Extent)) AS T WHERE Name = “Misfortune” GROUP BY ID, T.Region) AS NEW(ID, Trajectory, Tstart, Tend, Region) GROUP BY NEW.ID HAVINGCONTAIN(New.Region, New.Trajectory)
Conclusion • Better data models and query languages for temporal and spatio-temporal information • Multi-layered architecture for spatio-temporal extensions on O-R systems • Support further extensions and customization by end-users via user-defined spatio-temporal aggregates