390 likes | 547 Views
Lecture 03: Data Foundations. September 14, 2010 COMP 150-12 Topics in Visual Analytics. Lecture Outline. Data Operations Metadata Structure vs. Value Value Derived Value Derived Structure Structure. Data Foundations Basic Data Types Nominal Ordinal Scale / Quantitative Interval
E N D
Lecture 03:Data Foundations September 14, 2010 COMP 150-12Topics in Visual Analytics
Lecture Outline Data Operations Metadata Structure vs. Value Value Derived Value Derived Structure Structure • Data Foundations • Basic Data Types • Nominal • Ordinal • Scale / Quantitative • Interval • ratio • Dimensionality • Scalars • Vectors • Matrices • Tensors
Data Definition • A typical dataset in visualization consists of n records • (r1, r2, r3, … , rn) • Each record ri consists of m (m >=1) observations or variables • (v1, v2, v3, … , vm) • A variable may be either independent or dependent • Independent variable (iv) is not controlled or affected by another variable • For example, time in a time-series dataset • Dependent variable (dv) is affected by a variation in one or more associated independent variables • For example, temperature in a region • Formal definition: • ri = (iv1, iv2, iv3, … , ivmi, dv1, dv2, dv3, … , dvmd) • where m = mi + md
Basic Data Types Def: A set of not-ordered and non-numeric values For example: Categorical (finite) data {apple, orange, pear} {red, green, blue} Arbitrary (infinite) data {“12 Main St. Boston MA”, “45 Wall St. New York NY”, …} {“John Smith”, “Jane Doe”, …} • Nominal • Ordinal • Scale / Quantitative • Interval • ratio
Basic Data Types Def: A tuple (an ordered set) For example: Numeric <2, 4, 6, 8> Binary <0, 1> Non-numeric <G, PG, PG-13, R> • Nominal • Ordinal • Scale / Quantitative • Interval • ratio
Basic Data Types Def: A numeric range Interval Ordered numeric elements on a scale that can be mathematically manipulated, but cannot be compared as ratios For example: date, current time (Sept 14, 2010 cannot be described as a ratio of Jan 1, 2011) Ratio where there exists an “absolute zero” For example: height, weight • Nominal • Ordinal • Scale / Quantitative • Interval • ratio
Basic Data Types (Formal) • Nominal (N) {…} • Ordinal (O) <…> • Scale / Quantitative (Q) […] • Q → O • [0, 100] → <F, D, C, B, A> • O → N • <F, D, C, B, A> → {C, B, F, D, A} • N → O (??) • {John, Mike, Bob} → <Bob, John, Mike> • {red, green, blue} → <blue, green, red>?? • O → Q (??) • Hashing? • Bob + John = ?? Readings in Information Visualization: Using Vision To Think. Card, Mackinglay, Schneiderman, 1999
Operations on Basic Data Types • What are the operations that we can perform on these data types? • Nominal (N) • = and ≠ • Ordinal (O) • >, <, ≥, ≤ • Scale / Quantitative (Q) • everything else (+, -, *, /, etc.) • Consider a distance function
Dimensionality • Scalar • A single value • Vector • A collection of scalars • Matrix • A collection of vectors • Tensor • A collection of matrices
Dimensionality (Programming) • Scalar • 0-dimensional array • Vector • 1-dimensional array • Matrix • 2-dimensional array • Tensor • 3 or more dimensional array
Dimensionality (Technically) • Scalar • 0th order tensor • Vector • 1st order tensor • Matrix • 2nd order tensor • Tensor • n-d tensor
Example, OLAP • OLAP = OnLineAnalytical Processing • Often referred to as “data cube” or “hypercube” Image from Wikipedia: OLAP Cube
OLAP Operations • Slice • Selects a subset of the original n dimensional cube • Result set could be of any dimensionality • Roll up (consolidate) • Creates a hierarchy based on the dataset • Same as clustering • Drill down • Expand a cluster • Pivot • Changes the orientation of the cube • Combine with the 4 basic SQL commands: • SELECT, UPDATE, INSERT, DELETE Adapted from Wikipedia: OLAP Cube
OLAP vs. SQL • Often used in business intelligence • Allows for quick change in perspective of the same data • For example, consider the above case implemented in SQL vs. OLAP • Considered as an abstract representation of RDBMS • Supported by many commercial databases • Uses a language called MDX MDX example from Wikipedia: MDX
Application • Related to your homework 1 • A powerful data representation for analysis. • Is the basis of Tableau Software
Metadata • Introduced by Lisa Tweetie in CHI 1997 (“Characterizing Interactive Externalizations) • Defined as “data about data” • Extends the original concept by Bertin of data values and data structures. • Values (low-level): variables relevant to a problem • Structures (high level): relations that characterize the data as a whole (e.g. links, equations, constraints)
Metadata – 4 Relationships Derived Values Example: average Derived Structure Example: sorting a list of variables • Values → Derived Values • Values → Derived Structure • Structure → Derived Values • Structure → Derived Structure
Values → Derived Values → Derived Structure • Values: a (text) document corpus • Derived values: compute the similarities between the documents • Derived Structure: apply multi-dimensional scaling to plot the documents in a spatial view.
Values → Derived Values → Derived Structure IN-SPIRE by PNNL
Structure → Derived Structure → Derived Values • Structure: a tabular layout of individuals’ relationships with each other • Derived Structure: convert the tabular structure to a graph • Derived Values: compute centrality to identify the importance of the individual in this social network
Structure → Derived Structure → Derived Values Image taken from: http://beth.typepad.com/beths_blog/2009/12
Analysis Flow Table 1 Table 2 Table 3 Create attribute Age, Income Pivot by Professions Operation Mean Table 5 Table 4 Table 6 Create classes Avg_Age, Avg_Income Pivot by Avg_Income
Application • An analyst can continue such process, or back-track, or branch from any given point. • Related to your homework 1!