160 likes | 282 Views
<odesi> project? Microdata? Say what?. TRY Conference May 5, 2008 Suzette Giles, Ryerson University Laine Ruus, University of Toronto. Acronym of “ Ontario Data Documentation, Extraction Service and Infrastructure I nitiative ” A n ew product delivered through Scholars Portal
E N D
<odesi> project? Microdata? Say what? TRY Conference May 5, 2008 Suzette Giles, Ryerson University Laine Ruus, University of Toronto
Acronym of “Ontario Data Documentation, Extraction Service and Infrastructure Initiative” • A new product delivered through Scholars Portal • Collaboration between OCUL and Ontario Buys • Web-based resource discovery to a growing collection of Canadian data • 3rd generation statistics and data extraction system • Equalizes access to selected statistics and data for all Ontario universities
A web-based extraction system • Provides equal access to these resources for all Ontario universities • Contains diverse, quality, numeric (microdata) data sets • Allows data resource discovery, extraction and analysis
First some background about <odesi> • Why is this resource so important that Ontario Buys and OCUL (Ontario Council of Ontario Libraries) are investing over $1.4 million? • How will it support teaching and research in quantitative methods and contribute to statistical literacy? • How will it help me at the Reference Desk?
Why is this resource so important that Ontario Buys and OCUL (Ontario Council of Ontario Libraries) are investing over $1.4 million? • 23 universities & colleges in Ontario belong to DLI, only 7 have full-time staff ‘doing’ data • therefore, large differences in data access by faculty and students • Goes some way to making access to data and statistics more easily findable.
How will it support teaching and research in quantitative methods and contribute to statistical literacy? • As a resource discovery tool, ability to search across collections at a more finely-grained level than Stats Can provides • Access to resources that have not previously been readily accessible, eg Canadian Gallup polls • Ability to quickly and easily display descriptive statistics, regardless of whether the resource is aggregate statistics or microdata. • Web-based access supports use in classroom as well as 24x7 access for research.
How will it help me at the Reference Desk? • Easy access to a collection of statistics and data in a uniform interface • Blurs the distinctions between aggregate statistics and microdata • Enable the creation, on the fly, of aggregate statistics that have not been published elsewhere
What are microdata and why do I need to know about them? • Microdata are the actual responsesthat survey or census respondents give to a questionnaire, • Usually translated into a numeric format so that one can do arithmetic with them For example: • Income: (1) high, (2) medium, (3) low • Average = ???? • Income: (1) $13,725 (2) $118,297 (3) $63,958 • Average = $65,327
What are microdata? • A person’s sex or gender could be recorded as Male or Female, but instead it will be coded 1 or 2. • 1 = male 2 = female or the other way • 1 and 2 are the VALUE given to the VARIABLE in this case sex (or gender) • A microdata file consists of numbers
From microdata are generated descriptive statistics • Microdata • A person is working or not working • Aggregate statistics • A count of the number of persons not working (in a geographic area) • A count of the number of persons not working divided by the number of persons in the labour force = the unemployment rate
…and more descriptive statistics • Microdata • A family has a gross annual income in year 2005 • Aggegregate (descriptive) statistics • Families in a geographic area have an average income • 50% of families in a geographic area have an income above (or below) the median income • LICO is the % of families in a geographic area that have an income below the low income cut-off for that geographic area and family size
<odesi> Available to all Ontario universities Searching metadata across files Descriptive statistics only Download system files (users needs to have software) SDA 10 universities subscribe, incl Ryerson & York More advanced statistical analysis functions Download raw data & syntax files (user needs to create system files) Two systems
What’s changed? • In the ‘bad old days’ • Statistics were published in books/periodicals, data were ‘published’ mainly as files of microdata or very extensive aggregate statistics • You needed access to a mainframe or PC.Mac • You needed special software (SAS, SPSS, Stata) • You needed training in the production of descriptive statistics (weighting, types of data and appropriate types of descriptive statistics)
But nowadays! • Statistics are published as Excel files, Beyond 20/20 files, and microdata are available in 3rd generation interfaces • All you need is a computer and a web browser • Generate descriptive statistics with a few mouse clicks – tho’ you still need to know how to interpret them (and knowing about weighting is a good idea too!) • Users can download data 24x7 for further processing on their own workstations with appropriate software