340 likes | 351 Views
Terminology for Statistics. How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel Hill stephani@ils.unc.edu. Overview. Terminology and End User Searching Characteristics of users and searches Types of queries
E N D
Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel Hill stephani@ils.unc.edu
Overview • Terminology and End User Searching • Characteristics of users and searches • Types of queries • Other sources of confusion • Ideas for Solutions • Goals • What needs to be solved • Possible tools and structures • Final Points Open Forum 2000
Terminology and End User Searching • Characteristics of users and searches • Types of queries • Other sources of confusion Open Forum 2000
Searching isn’t easy • “Query matching is effective only when the search is specific, the searcher knows precisely what he or she wants, and the request can be expressed adequately in the language of the system” (Borgman, 1996, p. 494) • If you don’t know what to call it, you can’t find it. • If you don’t know what it means, you can’t use it. Open Forum 2000
The Mapping Problem Search Data Element(s) Agency Term(s) User’s Term(s) User’s Information Need Open Forum 2000
Inside the System – Metadata Registry • Statistical experts’ understanding and usage • Crisp operational definitions (ideal) • Unambiguous terms (ideal) • Minimal or predictable contextual effects Data Element(s) Agency Term(s) Open Forum 2000
Outside the System • Choice of terms may depend on: • user’s domain knowledge • user’s search knowledge • user’s notion of what is available • terms seen elsewhere • luck? User’s Term(s) User’s Information Need Open Forum 2000
Users’ Knowledge Varying sophistication of questions • What is the universe for this survey question, given the questions leading up to it? • What is the current unemployment rate? Please send me the answer before my 9:00 class tomorrow. Open Forum 2000
Types of Queries • Correct (matching) term consumer price index consumer price index • Obvious synonym health care medical care (CPI) • Conceptual cluster of synonyms/near synonyms woman, female, girls women Open Forum 2000
Types of Queries (2) • “External” terms, common outside the agency, no direct data element equivalent inside the agency. inflation (generally use CPI or PPI) turnover (retention rate? job or profession tenure?) new jobs (first appearance on payroll?) Open Forum 2000
Types of Queries (3) • “Trendy” terms. Subset of external terms. cyberjobs (from magazine article) Webmaster (recent coinage) reinvention Open Forum 2000
Types of Queries (4) • Concept access ”Give me everything you have about worker benefits” Good answer requires pulling together information from many sources (which may be more or less compatible). (See MapStats for example. http://www.fedstats.gov/mapstats/) Open Forum 2000
Contributing Factors • Confusion about basic statistical concepts seasonal adjustment “Indicates the adjustment of timeseries data to eliminate the effect of intrayear variations which tend to occur during the same period on an annual basis.” (BLS Selective Access) Open Forum 2000
“To seasonally adjust a given economic time series is to eliminate that part of the change in the series which can be ascribed to the normal seasonal variation” “Seasonal adjustment is a mathematical process whereby the effects of recurring non-economic factors are removed from an economic time series.” (Dictionary of U.S. Government Statistical Terms, 1991) Open Forum 2000
“A term applied to time series from which periodic oscillations with a period of one year have been removed.” (Cambridge Dictionary of Statistics, 1998) What is this number, and what does it mean? rate, index, ratio, value Open Forum 2000
Contributing Factors (2) • Major conceptual distinctions and when they apply. • Different levels of geographical regions, and the data available at each level (nation, region, state, metropolitan area, county) • Establishment data vs. household data • Note the importance of context in the use of these terms and data. Open Forum 2000
Contributing Factors (3) • Inherent ambiguity: the pay concept • Carol Hert & John Fieber, search terms from FedStats Web Page (http://www.fedstats.gov/), 11/98, 28,248 unique queries • Agency terms used for pay concept include: income, compensation, earnings, wage, salary Open Forum 2000
BLS/CPS Terms • Total combined income • “includes money from jobs, net income from business, farm or rent, pensions, dividends, interest, social security payments and any other money income received” (CPS) • Compensation • “sometimes used to encompass the entire range of wages and benefits” (BLS Glossary of Compensation Terms) Open Forum 2000
BLS/CPS Terms (2) • Usual weekly earnings • “include any overtime pay, commissions, or tips usually received” (CPS concepts) • Hourly earnings • “hourly rate as stated by the employer…does not include tips, commissions, or any other non-hourly wages.” (CPS interviewer manual) Open Forum 2000
What does this user want?correction officer, income • Monetary income received - including that unrelated to job • Compensation, including benefits - total job package • Usual weekly earnings - including regular overtime • Hourly earnings - excluding overtime Open Forum 2000
Ideas for Solutions • Goals • What needs to be solved • Possible tools and structures Open Forum 2000
Goals for Possible Solutions • Maintain the distinction between agency (authority) terms and user terms. • Note the distinction between a terminology and user vocabulary • Often lack of structure, stability, or context (although patterns do exist) Open Forum 2000
Not equally weighted terminologies T1 T2 Data Element Concepts Data Elements Open Forum 2000
Asymmetrical Structure Agency Terms User Terms Data Element Concepts registry contents Data Elements Open Forum 2000
Maintenance Issues • Indexing is not the primary function of the agency. • Less than total coverage will still help. • Can we assume: • Agency terms are adopted/defined slowly? • User terms are more volatile (especially the “trendy” ones)? • How often must mapping structures, procedures be updated? Open Forum 2000
Easing Users’ Pain • No problem • same word(s), same meaning • different word(s), different meaning • Support needed (thesaurus, definitions, explanation) • different word(s), same meaning (synonyms) • same word(s) or different word(s), some relationship between meanings (e.g., BT, NT, part-of, domain specific) Open Forum 2000
Same word(s) or different word(s), some undefined overlap in meaning • ??? Can these users be helped ??? • Same word(s), different meaning (if unnoticed by user) • Same word(s) or different word(s), no relationship (wrong source of information?) Open Forum 2000
Providing Agency Information • Substituting agency term(s) for user term(s) and/or expanding user term(s) • Hidden or overt? • Automatic or interactive? • Displaying conceptual term clusters (e.g., gender, race, occupation) • Facilitating browsing Open Forum 2000
Giving definitions and examples • source? • “official” or basic? • Highlighting usage notes (the footnotes) • Who needs to see them? • When? Open Forum 2000
Crosswalk • Mapping between agency and user terms • Asymmetrical, build from users’ side • 80/20 principle for coverage • Multiple sources of terms: • Search sessions • Interviews with consultants, intermediaries • Media reports, textbooks, other “public” sources Open Forum 2000
Asymmetrical Structure Agency Terms User Terms Data Element Concepts Crosswalk Data Elements Open Forum 2000
“Enhanced Indexing” • Expanding agency pay terms, FedStats Web page (Hert & Haas, preliminary findings) • Assume that more overlap between terms increases users’ chances of success • Query sessions where 50% of terms were agency terms • Without expansion = 89% • With expansion = 73% Open Forum 2000
Other Possibilities • Thesaurus, with relationships such as see and use for • Multilingual thesaurus or dictionary, treating terminologies as equal • Fully incorporate end-user terms into classification or data element concept entries (Desirable?) Open Forum 2000
Final Points • Users are inventive in term use. • Users discourage easily. • Maintenance is a crucial concern. • Is the 80/20 principle useful? Open Forum 2000