630 likes | 761 Views
Relative Information Capacity of Simple Relational Database Schemata. Paper by: Richard Hull Presented by: Jose Picado. Outline. Problem: Data relativism and information capacity Definition Examples Importance Hierarchy of dominance measures Basic results Discussion. Data relativism.
E N D
Relative Information Capacity of Simple Relational Database Schemata Paper by: Richard Hull Presented by: Jose Picado
Outline • Problem: Data relativism and information capacity • Definition • Examples • Importance • Hierarchy of dominance measures • Basic results • Discussion
Data relativism • Represent the same data in different ways
Data relativism • Represent the same data in different ways • Represent the same data under different schemas
Data relativism • Represent the same data in different ways • Represent the same data under different schemas Schema 1 Example taken from: Kosky, Anhony. Transforming Databases with Recursive Data Structures, 1996.
Data relativism • Represent the same data in different ways • Represent the same data under different schemas Schema 1 Schema 2 Example taken from: Kosky, Anhony. Transforming Databases with Recursive Data Sturctures, 1996.
Relative information capacity • Expressiveness of a schema • Different schemas representing same data may have different information capacity
Relative information capacity • Expressiveness of a schema • Different schemas representing same data may have different information capacity Schema 1 Schema 2 Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.
Relative information capacity • Expressiveness of a schema • Different schemas representing same data may have different information capacity • Schema 1: • Does not require that the spouse attribute of a man goes to a woman. • Does not require that for each spouse attribute in one direction there is a corresponding spouse attribute in another direction. Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.
Relative information capacity • Expressiveness of a schema • Different schemas representing same data may have different information capacity • Schema 2: • Allows unmarried people to be represented in the database. Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.
Relative information capacity • Possible solution: • Transform existing schema to new schema by structural manipulations transformation
Relative information capacity • Possible solution: • Transform existing schema to new schema by structural manipulations • Information capacity preserving? transformation
Importance • Schema evolution • None of the information stored in the initial database is lost
Importance • Data integration • All information in one of the component databases is reflected in the integrated database Example taken from: Kosky, Anthony. Transforming Databases with Recursive Data Structures, 1996.
Importance • Database normalization theory • User view construction • Schema simplification • Translation between data models
Hull’s paper • Introduces theoretical tools for studying measures of relative information capacity • Theoretical frameworks at the time were complex • There was no clear definition about the concept • Hull introduced nice ways of comparing schemata and their information capacity • Defines a hierarchy of measures to compare information capacity of schemata
Hull’s paper • Gives some basic results concerning the previous measures • Considers only non-keyed relations Non-keyed Keyed Relations: Instances:
Definitions • Schema P is a set of relations • Relations composed of attributes, which may be of different basic types • Basic types are domain designators (have a fixed domain of possible values) • I(P) is the instances of P, usually infinite Instances I(P) Schema P …
Transformation • P and Q are relational schemata • A transformation from P to Q is a map
Transformation • P and Q are relational schemata • A transformation from P to Q is a map P
Transformation • P and Q are relational schemata • A transformation from P to Q is a map P Q
Transformation • P and Q are relational schemata • A transformation from P to Q is a map P PersonInfo(x,y,z) :- Person(x,y), Birth(x,z). Q
Dominance • P and Q are relational schemata • Q dominates P via if the composition of followed by is the identity on P
Dominance P Q
Dominance • Take instances of P: I(P)
Dominance • Apply to I(P) Male(x) :- Person(x,y,z), y=“male”. Female(x) :- Person(x,y,z), y=“female”. Marriage(x,y) :- Person(x,u,y), Person(y,v,x), u=“male”, v=“female”
Dominance • Apply to (I(P)) Person(x,”male”,z) :- Male(x), Marriage(x,z). Person(x,”female”,z) :- Female(x), Marriage(x,z).
Dominance • Compare I(P) and ( (I(P))) I(P) ( (I(P)))
Dominance • P and Q are relational schemata • Q dominates P via if the composition of followed by is the identity on P Information structured according to P can be restructured to “fit” into Q, and restructured again to “fit” into P Q has at least as much capacity for storing information as P
Equivalence • P and Q are equivalent (xxx) if they have equivalent information capacity • P and Q are equivalent if • Q dominates P (xxx) and • P dominates Q (xxx)
Information dominance measures • Calculous dominance • Generic dominance • Internal dominance • Absolute dominance More restrictive Less restrictive
Types of equivalency • P and Q are equivalent (calc) • P and Q are equivalent (gen) • P and Q are equivalent (int) • P and Q are equivalent (abs) More restrictive Less restrictive
Level 1: Calculous dominance • Only allow transformations to be relational calculus expressions • Relational calculus: • First order logic or predicate calculus • Predicates: atom, • Each query Q(x1, …, xn) is a predicate P
Level 1: Calculous dominance • Only allow transformations to be relational calculus expressions • are relational calculus expressions • Q dominates P calculously
Level 2: Generic dominance • Only allow transformations that treat domain elements as “essentially uninterpreted objects” • Treat all elements as equals except some set of constants • Property of all query languages, such as SQL and Datalog
Level 2: Generic dominance • Only allow transformations that treat domain elements as “essentially uninterpreted objects” • treat all elements as equals • Q dominates P generically
Level 3: Internal dominance • Only allow transformations that do not invent any data • Invent data: numerical computations or string manipulations performance = goals/games
Level 3: Internal dominance • Only allow transformations that do not invent any data • do not invent data • Q dominates P internally
Level 4: Absolute dominance • Some set of values • : instances of P that contain only values in Y, where • : cardinality of instances of P containing only values in Y • If thenQ dominates P absolutely • Easy to compute: based on counting of instances, instead of transformations
Basic results • Q dominates P calculously Q dominates P generically Q dominates P internally Q dominates P absolutely
Basic results • Sometimes absolute and internal dominance hold, but generic and calculous dominance don’t • Q dominates P (abs, int) • and transformation (int) does not invent data • Q does not dominate P (gen, calc) • There is no transformation (gen, calc) that takes instances of P to Q and then back to P P Q
Basic results • Absolute dominance useful for verifying calculous (not) dominance • Q dominates P calculously Q dominates P absolutely • P does not dominate Q absolutely P does not dominates Q calculously P Q *under certain constraints
Basic results • Dominance is preserved by re-namings of basic types (homomorphism) • h(P): homomorphism of P • If Q dominates P thenh(Q) dominates h(P)for any measure of dominance (calc, gen, int, abs)
Basic results • Calculousdominance does not accurately measure the presence of “semantic correspondence”
Basic results • Calculousdominance does not accurately measure the presence of “semantic correspondence” NAME NUMBER NUMBER NAME NAME NUMBER S1 R1 P S2 R2
Basic results • Calculousdominance does not accurately measure the presence of “semantic correspondence” NAME NUMBER NUMBER NAME NAME NUMBER S1 R1 P S2 R2 Q T
Basic results • Calculousdominance does not accurately measure the presence of “semantic correspondence” NAME NUMBER NUMBER NAME NAME NUMBER S1 R1 P S2 R2 Q T Q dominates P (calc), but there is not semantic mapping from P to Q
Basic results • If only non-keyed relational schemata with only one basic type, then all types of dominance are equivalent Theorem: Let P and Q be non-keyed relational schemata over a single basic type B. Then the following are equivalent: Q dominates P (calc) Q dominates P (gen) Q dominates P (int) Q dominates P (abs)
Basic results • With any reasonable measure of relative information capacity, two non-keyed relational schemata are equivalent iff they are identical • In the relational model (non-keyed), there is essentially at most one way to represent a given data set
Discussion • Strong points: • ???