1 / 33

“Reverse Engineering” Statistical Metadata through User Studies

Explore how users interact with statistical information to reverse engineer necessary metadata. Learn insights into metadata retrieval, duplication, and specificity in agency-specific data. Understand user uncertainties and expert perceptions in statistical data tasks.

Download Presentation

“Reverse Engineering” Statistical Metadata through User Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Reverse Engineering” Statistical Metadata through User Studies Carol A. Hert Syracuse University January 23, 2003

  2. Presentation Overview • Defining metadata (yet again) • Rationale for user studies—reverse engineering of metadata • Two studies of users • Users of statistical tables • Users during statistical integration tasks • Implications for system design

  3. Definition of Metadata metadata are information entities preserved in artifacts that perform the task of providing context designed to help the user create, locate, understand, and use* the entities/data to which the metadata refer *help the user manipulate the entity throughout the entity’s lifecycle

  4. The Metadata Challenge • What information entities are metadata (and what aren’t)? • Which metadata are necessary, essential, optimal for which tasks (and can we acquire them)? • How can we understand metadata use and creation to improve our metadata systems (and other tools for user understanding)?

  5. A Viable Approach • Reverse engineer metadata elements by investigating how users interact with statistical information and determining what information is necessary to support them

  6. The Viability of User Metadata Studies • Plethora of potential metadata • Cost of creating or harvesting, maintaining metadata and metadata systems • Uncertain utility of some metadata

  7. Rationale for User Studies • Examination of users in situ can provide insight into which metadata are used, when, in what formats, etc. • Accepted strategy in social informatics, sociology of technology and work

  8. The User Studies • Study 1: Metadata needs during usage of statistical tables • Study 2: Metadata needs during tasks requiring integration of statistical information Both funded by U.S. National Science Foundation and Bureau of Labor Statistics

  9. Exploring Metadata for Understanding Statistical Tables • Task concerned understanding statistical tables • Identified user questions/uncertainties about specific tables • Yielding potential metadata elements • Searched for answers in existing metadata sources • Investigating potential for harvesting metadata

  10. Exploring Metadata for Understanding Statistical Tables • 11 respondents, each worked with 3 tables (mix of electronic and paper) • total 170 uncertainties categorized into 5 major categories

  11. Findings about Metadata for Tables • Most common questions concerned definitions, followed by rationales • Questions related to statistical domain, general table structure, and interface • Rationale questions difficult to answer with existing metadata

  12. Types of Uncertainties • Definitions (of terms, categories, abbreviations, universe) (97 of 170) • Rationales (28 of 170) • Table structure (e.g. format, layout, link structure) (24 of 170) • Lack of information on • Data collection and sources (4 of 170) • Computational methods (4 of 170) • Comparability/relationship of information (6 of 170) • Others (5 of 170) • Other (2 of 170)

  13. Insights about Metadata • Metadata often difficult to retrieve (due to unstructured format) • Metadata duplicated in multiple places (often manually and with editorial changes) • Metadata needed were agency-, table-, or statistics-specific

  14. And a Tension What is the relationship among metadata and other types of information and when and how to these sources interact to support particular tasks? (a.k.a. what are metadata?)

  15. Metadata During Integration Tasks • What problems/uncertainties do specific types of users have during tasks involving integration of statistical data? • For the same tasks, what problems/uncertainties do experts perceive as being relevant to usage of the data by the user populations? • How do problems experienced by end-users compare to those identified by experts? • What metadata or other information can be identified to resolve user problems?

  16. Metadata During Integration Tasks • Goals of Study • Extend our knowledge of metadata usage • Inform design of tools that incorporate metadata • Consider metadata tools in conjunction with larger set of statistical literacy tools

  17. Metadata During Integration Tasks • Methodology • Five tasks requiring integration across sources • Users did 1-2 of the tasks • Think aloud protocols used with follow-up interview • To date, 14 expert users, second round of data collection about to begin

  18. The Tasks • 3 variants of “Find 4-6 economic indicators for a particular county and compare the county’s economic status to its state and the United States as a whole” • While looking at the economic indicators for Nebraska you notice that the unemployment numbers are not the same at the BLS site and at the Nebraska site—try to determine why.

  19. The Tasks • You are interested in building a soybean crushing plant in either Nebraska or South Dakota. Examine natural gas and electricity prices in the states to determine an appropriate location.

  20. The Tasks • You have become increasingly concerned about urban sprawl in North Carolina. You are looking for statistics on loss of farming lands and farming income in Orange, Durham, and Wake counties. Has the loss of farmland in these counties been greater than 50% since 1992? How does the loss of farmland and farm income in the Raleigh-Durham area compare to the loss of farmland and farm income across the nation as a whole?

  21. Findings to Date • Integrating activities of users • Making comparisons • Noting discrepancies (between data, in presentation approach, etc.) and/or asking what the difference is due to • Manipulations (e.g., mathematical, exporting to spreadsheets) • Barriers to integration

  22. More findings • Strategies used to find and integrate sources, data, to understand scope of task • Knowledge used • Types of questions/uncertainties • Terminology used • Aspects of data that matter to the user during the task

  23. Findings to Date • Comparisons are a critical aspect of integration • Comparison types identified: • Geographic units • Definitional differences in concepts and variables • Across time • Data from different sources (websites, surveys) • Index value comparisons

  24. Barriers to Successful Integration • Definition, source information lacking • User lack of knowledge of appropriate strategies (e.g., using time series data, types of calculations to perform) • User lack of knowledge about usage of index values, statistical activity purpose and approach • Interface design problems (such as scrolling row and column headers)

  25. Further Barriers • Inconsistent data across sources • Inconsistent interfaces • Inability to determine whether data wanted for comparison are available • Lack of domain knowledge • Lack of knowledge of how to handle inflation, seasonal adjustment • Terminology differences

  26. Other Findings • Terminological variants within/across agencies and between users and agencies • Different approaches suggest different statistics to users • Experts use agency and domain knowledge extensively

  27. Using the Results • Incorporate specific metadata into a variety of tools • Provide answers from metadata sources for specific presentations, tasks, etc. • Issues are specificity of answer, uniqueness of answer • Identifying metadata elements and sources of metadata • Determine tools appropriate to a particular user situation

  28. Tools/Approaches under Development • Glossary lookup • Ontology for cross walking • Relationship browser • Enables a person to preview website, datasets by specifying particular relationships (e.g. show me datasets that include unemployment variables and come from surveys of households)

  29. Tools/Approaches Under Development • Relationship browser that will modify itself based on the underlying object classes/variables available • Embedded help via “sticky notes” • Online communities of interest (via communication tools) • Tutorials, scenarios of use

  30. Mapping Needs to Tools • Definitional information: glossary, mappings of agency terminology to user terminology, ontologies • Scoping problem (e.g., what is an economic indicator): example indicators, general definitions • Non-linked explanatory information—mouse-overs at point of linkage, additional linkings

  31. Mapping Needs to Tools • Managing data collected: access to table builders, word processing, spreadsheets • Finding comparable numbers: relationship browser (e.g., geographic, time unit by indicator) • Confusion of large number of text links: relationship browser (show me pages/parts of site) that have economic indicators

  32. Integrating Metadata Systems with Other Tools • Metadata are one component of a statistical information network • Metadata systems important • Metadata as “organizers, content” of other systems • Metadata systems need to pass metadata to other tools and vice versa • A New Question: How do our metadata systems and repositories interact with other tools?

  33. Further Information Carol A. Hert cahert@syr.edu The overall project: http://ils.unc.edu/govstat

More Related