510 likes | 600 Views
A comprehensive model for Data Quality Value of Data, and User Interface Design. Andrew U. Frank Geoinformation TU Vienna frank@geoinfo.tuwien.ac.at. What are the most important problem hindering wide use of GIS today?. Gueting said: Support for temporal data Spaccapietra said: Semantics.
E N D
A comprehensive model for Data QualityValue of Data, and User Interface Design • Andrew U. Frank • Geoinformation • TU Vienna • frank@geoinfo.tuwien.ac.at Andrew Frank
What are the most important problem hindering wide use of GIS today? • Gueting said: Support for temporal data • Spaccapietra said: Semantics Andrew Frank
What are the most important practical problems for the GI industry? • Consider that the market for GI in Europe is only 1/10 of the comparable industry in the USA (approx. same population). • Impediments for business: • User Interface • Value of Data • Data Quality Andrew Frank
Comprehensive model of GI use • Different applications of GIS are operating with very different concepts of what the GIS produces: • Produce maps (for decision makers) • Analyze situations • Explore data • Each time, a different user interface must be learned, which is a high cost and a large impediment. Andrew Frank
Economic value of information • (Geographic) information can only be used to improve decision. • This is the only situation in which data can produce economic value. • Read: • Varian & Shapiro: Network economy Andrew Frank
Model of rational decision making: • A rational man (a.k.a. homo economicus) decides between action such that his well-being is optimized. Andrew Frank
Multiple critiques: • Not just economic (monetary) optimizations, but general well-being. • Bounded rationality: neither the information nor the inference resources are available to make the optimal decision • … Andrew Frank
Model of rational decision making is (only) a model • Descriptive model: it is often used when we rationalize our behavior after the fact. • We explain our actions in terms of optimizing our utility. • Prescriptive model: for administrative decisions the model is used to justify a decision and to communicate the arguments to others. Andrew Frank
Core model of rational decision making • Produce all candidate actions • Exclude action by non-compensatory criteria • Evaluate utility of remaining candidate actions using compensatory criteria and weights. • Select best action (i.e. action with highest utility). Andrew Frank
Actions change state of the world: Andrew Frank
Hotel for a weekend: candidates Andrew Frank
My Criteria • Distance to beach • Classification of hotel • Restaurant • Garden • Trail access • Noise • Price Andrew Frank
Collection of data for these criteria Andrew Frank
Normalize data • Data is collected on different measurement scales (cf. Steven’s paper in Science 1946). • Make it comparable by normalizing it, for example on a scale 0..10 (or 0..1), but allow positive and negative utility. Andrew Frank
Non-compensatory criteria • Non-compensatory criteria (a.k.a. K.O. criteria) • must be fulfilled for a candidate to make it acceptable. Andrew Frank
Compensatory criteria • These criteria list the contribution of properties of the candidate actions. • Weights indicate what the contribution to utility per unit of the property is Andrew Frank
Unifying criteria Andrew Frank
Interaction with the spreadsheet: • The weights are not well determined – this is one of the major critique of the method. • Too many non-compensatory criteria: no elements left. • Reduce non-compensatory criteria. • Many similar solutions – reduce weight for the common criterion. Andrew Frank
User interaction style • User interface must be “direct manipulation” – not requiring a rational analysis, • but give a ‘feeling’ for connections between criteria and optimal selection. Andrew Frank
User Interface Consideration • Shneiderman has pointed out that the only interface style which works consistently are interfaces based on direct manipulation. They exploit human abilities which are not based on verbal (rational) understanding, but use the connection between actions and reactions. • Direct manipulation: • The user has some controls and the result reacts immediately to changes. Andrew Frank
Emotional aspects • Experience shows that users play with weights till the solution feels right. • This means, that it is emotionally acceptable. • Modern neurophysiology has observed that actual decision making in human brains is not rational, but emotionally controlled. • Insert a property ‘likable’ and assess each candidate. Then the weight given this property indicates the emotional influence. Andrew Frank
What are the controls in the rational decision model? • Non-compensatory criteria: • Threshold for fulfillment. • Compensatory criteria: • Weight • What data is considered – either a threshold or a weight is set. Andrew Frank
A first sketch of an interface: • Very simple interface. • Interface is completely in the language of the user. Andrew Frank
General user interface because model is general • The rational decision model is general; EVERY decision is modeled. • Users have to learn only one conceptual model, not many different ones. Andrew Frank
Decision model links directly to user task • Intermediate elements are excluded, which simplifies the conceptualization (less is better!) • Compare with Standard approach: GIS produces map which is used as input to the decision process. • Many details of map form must be fixed, which are not relevant for the decision process. • User interface must have controls for these. Andrew Frank
Value of decision • In the model of rational decision making, the value of data can be estimated: • The value of the data is the improvement of the decision compared to no information. • For decision on actions where the action have a cost, the difference between highest and lowest cost can be used as an estimate for the value of the decision. Andrew Frank
Value of data • Properties which have more weight contribute more to the decision. The value of the decision can be distributed to the data according to the weights. Andrew Frank
Price of data • The value of the data is not the price at which it can be sold: • Deduce cost of obtaining and using it • Price must be set for many users, value is specific for a decision. • Opportunities for specialized user interfaces, connections to data collections and thus BUSINESS. Andrew Frank
Data Quality • Quality of the data is typically measured from the perspective of the data producer. Metadata standards codify this approach. • Observations indicate that users are not using metadata. How should a user decide on the usability of data from metadata? Andrew Frank
Data quality from a user perspective: • Data is good, if it leads to the best decision. It is bad, if it makes me take the wrong decision. • Data quality is the risk of me making the wrong decision. Andrew Frank
Can we translate a producers assessment of data quality to the risk of the user making the wrong decision? • Example: Precision • The producer of data states that the distance to the beach is 100 m +- 50 (one standard deviation, corresponds to 68% of all values are between 50 and 150 m). Andrew Frank
Translation of completeness to risk • Incomplete data will make us miss the best solution. The risk is comparable to the amount of missing data. Andrew Frank
Example: • 50% of data are missing (realistic in the selection of hotels based on web browsing). • Reduce value of data by risk proportionally. Andrew Frank
Temporal currency • Temporal currency is a standard data quality element. • Temporal currency is not ‘separable’ from other criteria. Andrew Frank
Effects of temporal currency • Time passed since collection reduced • Precision • Completeness (omissions, commissions). Andrew Frank
Data does not change, but quality is diluated with time: Andrew Frank
Estimate movement per period and reduce precision proportionally: • Estimate appearance/disappearance of objects: • Reduce completeness proportionally. Andrew Frank
Decision model translates data quality to risk • The decision model translates • data quality to risk and • risk to a reduction in the value of the data. Andrew Frank
Conclusion • The model or rational decision making gives a single conceptual framework in which three important practical problems of today's use of Geographic Information can be discussed: Andrew Frank
User Interface • Decisions can be modeled as a selection of the action which optimizes the utility, given some conditions. • The user must select: what are the elements which influence the decision (selection of data layers, themes..) • What are candidate actions. • What are the minimal requirement for a property • What are his preferences, translated to weights for each property. • This is the same for many (all?) decision situations. Andrew Frank
Value of data • The value of the data is in the improvement of the decision. The contribution of each data element is comparable to the weight of this property. Andrew Frank
Data quality from a user perspective • Better data reduces the risk of taking a wrong decision. • Precision and completeness can be translated directly to the risk of taking a wrong decision and reduces the value of the data. • Temporal currency is first converted to reduced precision and completeness (this should be done by data provider) Andrew Frank
Closed loop semantics • My answer to the problem of semantics: • Link observation semantics in the database to action semantics in the decision. Andrew Frank