200 likes | 346 Views
Taking Constraints out of Constraint Databases. Dina Goldin University of Connecticut Applications of Constraint Databases Paris, France, June 2004. queries. Table-based Logical Layer. Physical Layer. Relational Databases. Codd[70] provided an additional level of abstraction
E N D
Taking Constraints out of Constraint Databases Dina GoldinUniversity of Connecticut Applications of Constraint Databases Paris, France, June 2004
queries Table-based Logical Layer Physical Layer Relational Databases Codd[70] provided an additional level of abstraction between physical data and queries queries Customized data layout for each application
Advantages of Relational Model • Data model: Uniform table-based representation for all data at logical level • Data independence: Can modify physical layer without affecting queries • Simple set-of-points semantics, RA=RC • Efficient indexing methods A commercial success in the 1980s!
Object-Relational Databases • Disadvantages of RDBs: • only good for traditional, “administrative” data • OO technology corrects this: • encapsulate non-administrative data • provide methods to access it • Object-relational databases provide this technology within a relational framework. They are the latest commercial success.
Outline • Introduction • relational, OR data models • GIS systems: • CDB technology to the rescue • Constraint Databases: • it’s not just about constraints • one more level of abstraction • Constraint-backed databases: • practical considerations • getting constraint-backed technology right
Geographic Information Systems • Until recentlly, leading commercial systems for spatial data • Not database systems per se • cannot manage non-geographic data • no ad-hoc querying (users perform built-in operations or execute predefined queries) • single-layered architecture (no data independence when writing queries) • in-memory (no index stuctures)
Newer Approaches to Managing Spatial Data • Marrying GIS and object-relational databases • Example: Oracle Spatial Data Option • Full power of a relational DB plus… • Spatial data • encapsulated as new data types within the OR framework • same data types as in ARC/Info (leading GIS system) • Spatial operations • as methods over the new data types • based on GIS operations • Spatial data access structures • based on bounding boxes
Data Separation in OR/GIS Databases • Spatial data stored in spatial relations • predefined set of spatial data types (point, region, etc…) • each relation is a set of spatial objects of one type, with a key • predefined set of operations over spatial objects • “Traditional” data stored in regular relations • Including thematic/descriptive data pertaining to spatial objects • Spatial & administrative data are logically separate • only keys of spatial objects to correlate between them • spatial data processing limited to predefined types and operators • Separation applies to query output as well • limited query expressiveness Can constraint databases offer a better solution?
Constraint Databases • Contribution of KKR[90,95] • Key idea: Allow relations that include infinitely many points • “Finite relations are generalized to finitely representable relations” [GK96] • Generalized: original term for tuples and relations with infinite semantics • We now prefer the term constraint for such tuples and relations Goal: next commercial success (for GIS applications)
queries Table-based Logical Layer Physical Layer Revisiting the Logical Layer • Components of the logical database layer: • set-of-tuples data semantics • implementation-independent (logical) data representation • Relational databases • finite semantics • trivial one-to-one correspondence between the two components • Constraint databases: • infinite semantics • correspondence between data semantics and data representation no longer trivial Infinite semantics of finitely representable data imply an additional level of abstraction; we need to separate logical layer into two
Logical Layer: (queries defined over this layer) finite set-of-point semantics;table-based representation; Implementation-independent Abstract Logical Layer:(queries defined over this layer) infinite set-of-point semantics Concrete Logical Layer: Finite data representation; implementation-independent Physical Layer: File-based data storage; indexing structures, data access methods; implementation-dependent Additional Level of Abstraction RDB to CDB: from two layers to three
Outline • Introduction • relational, OR data models • GIS systems: • CDB technology to the rescue • Constraint Databases: • it’s not just about constraints • one more level of abstraction • Constraint-backed databases: • practical considerations • getting constraint-backed technology right
Concrete Data Model in CDBs • Requirements for the concrete layer • clean set-of-point semantics • efficient (index-based) data access methods • not required to use constraints (queries are over the abstract layer, so actual choice of representation is transparent to user) • Pure Constraint Databases • concrete layer is constraint-based • examples: CDB/CQA (query algebra), MLPQ (logic programming) • Constraint-backed databases • concrete layer is not purely constraints • data may be represented geometrically
Practical Considerationsof GIS Applications • Data input/output is not based on constraints • data often obtained by digitization (generates points and segments) • geometrical, visual, some standard spatial format… • in pure CDBs, converted to constraints • Spatial features are never straight lines or convex polytopes • many short segments • frequent local change of direction • broken up into many constraint tuples (convex cells) per spatial object • Continuous (real time) data visualization • most users do NOT want to see constraints, but a GUI • visualization requires spatial outline (boundary points) • constraints need to be converted back to geometrical representation • conversions carry heavy performance penalty (not real-time) • Experience shows that practical systems are not pure • E.g. Dedale uses geometrical representations, explicitly translating to the constraint representation for the constraint engine [GSSG03]
Geometric Data Representation • In the physical layer, need for geometry-based representations recognized early on • KKR90 suggested computational geometry algorithms as evaluation primitives • Examples of geometric representations: • Points • Polylines: for trajectories, regions • Triangulated Irregular Networks (TINS): for terrains (2.5 dimensional) • Efficient visualization • Efficient query evaluation • If region R(x,y) is stored as a sequence of points that outline it, pXR can be obtained by finding extrema of X-coordinates for these points. • Bounding boxes equally easy to compute.
Role of Constraints in Constraint-Backed Databases Define query semantics (abstract level) • for proving query correctness • to spare users from ad-hoc operators with arbitrary restrictions • Provide default data model (concrete level) • one of the available data representations • e.g. when data is truly multidimensional • For data integration • as intermediate representation between non-compatible systems
DEDALE • Not a pure constraint database • Nesting takes place at abstract level LandUse(lname,geom[x,y]) Flight(fname,traj[t,x,y,a]) Country(cname,geom[x,y,h]) • Queries use nest and unnest operations explicitly • Geometric representation in the concrete layer • geom in Country is represented as a TIN • traj in Flight is represented as a set of sample points along the flight path • Data model does not separate spatial and administrative data
R0 := SELECT t=t1 from Flight • R1 := PROJECT R0 on fname,x,y • R0 := JOIN LandUse and Rect • R0 := JOIN LandUse and Rect • R1 = PROJECT R0 on lname • R2 = JOIN R1 and LandUse DEDALE vs. CQA/CDB • LandUse(lname,geom[x,y]) • Flight(fname,traj[t,x,y,a]) • Country(cname,geom[x,y,h]) • LandUse(lname,x,y) • Flight(fname,t,x,y,a) • Country(cname,x,y,h) • Over which location were the airplanes flying at time t1? MAP lX [X.fname, px,y ( st=t1 (X.traj))] (Flight) • Return the part of the parcels contained in rectangle Rect(x,y) MAP lX [X.lname, X.geom ∩ Rect] (LandUse) • Return all land parcels that have a point in Rect(x,y) plname,geom (MAP lX [X.lname, X.geom, s(x,y) in Rect (X.geom)] (LandUse)) Output limited to 2 spatiotemporal dimensions (3 in case of interpolated attributes) Pure constraint DB not practical
Getting Constraint-Backed Systems Right • Clean semantics and full expressiveness of constraint databases • Geometrical representation issues not a user concern • though expert users may want to take more control • System support for three-tier architecture • More sophisticated than for pure constraint databases, or for current spatial databases • Query processing engine must • choose the best concrete representation for output queries, among those supported by system • select query evaluation strategies in the presence of a wider mix of possible representations and techniques • take into account storage and visualization • perhaps maintain multiple representations for the same data?