220 likes | 282 Views
Data and metadata modeling. Tjalling Gelsema, te.gelsema@cbs.nl (Statistics Netherlands, methodologist). Contents. Introduction Objectives Where we left off ... Variables revisited Composition Product The big picture Metadata: terms Example. Introduction. Introduction. Objectives.
E N D
Data and metadata modeling Tjalling Gelsema, te.gelsema@cbs.nl (Statistics Netherlands, methodologist)
Contents • Introduction • Objectives • Where we left off ... • Variables revisited • Composition • Product • The big picture • Metadata: terms • Example
Objectives A framework for statistical data and metadata, that has the following features: • Precision • Aims at uncovering relationships (between a data set and its variables, between micro data and dimensional data, etc.) • Has a precise relationship between data and metadata • Uses algebra in order to reduce complexity • Focusses on machine representation • Bonus: detection of synonyms
Variables revisited A variable is a function v : p → xwith domain p and codomain x, represented in a (function) diagram as . For an element e in p, the corresponding value in x is v(e). The variable v can be pictured as
Composition In a situation , we define the composition of w and v, denoted v◦w or just vw, as the function vw(d) = v(w(d)) for every d in q. In a diagram: and pictured as:
Composition, examples • Composition of an operation (tax deduction) with a variable (pay): • Composition of a variable (age category) with an object type relation (partner 1):
Product In a situation , we define the product of z and v, denoted ‹z,v› as the function ‹z,v›(e) = ‹z(e),v(e)› for every e in p. The element ‹z(e),v(e)› is a member of the Cartesian producty×x of y and x, i.e., the set of all pairs of elements from y with elements from x. In a diagram:
Product, continued The (binary) product is tacitly extended to general situations, such as in which case the product is ‹v1,…,vn›.
Product, continued As a picture:
Product, continued As a concrete example:
The big picture Statistical information revolves around the notion of a function. • Variable: • Object type relations: • Operation: • Data set (micro): • Data set (macro):
Terms A termt is a sequence of characters (a string) with an accociated type p→q, such that, given a set of elementary symbols (such as activity, gender, income), • each elementary symbol is a term (of appropriate type), and • if t1 and t2 are terms of appropriate type, then ‹t1,t2› is a term (of appropriate type), and • if t1 and t2 are terms of appropriate type, then t1◦t2 is a term (of appropriate type), and • that’s all folks!
Terms, continued Given a set of elementary symbols (such as activity, gender, income), and a mapping h(a)=a, etc., we extend h such that • h(‹t1,t2›)=‹h(t1),h(t2)›, and • h(t1◦t2)=h(t1)◦h(t2) That’s it! The correspondence between metadata and data is a homomorphism.
Example Given the elementary symbols: the metadata corresponding with the data set is the term ‹de,ap1e,gp1e,ap2e,gp2e› (which points through h at the same data set as ‹d,ap1,gp1,ap2,gp2›e)
Exercise Given the situation: Show that ‹zw,vw›=‹z,v›w (first show that this makes sense, by drawing a functional diagram).