200 likes | 316 Views
Data dissemination at the University of Toronto. Presentation to National Bureau of Statistics, China by L Ruus <laine.ruus@utoronto.ca> Data Library Services, Map and Data Library, University of Toronto 2009-12-08 <http://www.chass.utoronto.ca/datalib/misc/ssb_2009.ppt>. Outline.
E N D
Data dissemination at the University of Toronto Presentation to National Bureau of Statistics, China by L Ruus <laine.ruus@utoronto.ca> Data Library Services, Map and Data Library, University of Toronto 2009-12-08 <http://www.chass.utoronto.ca/datalib/misc/ssb_2009.ppt>
Outline • Introduction • Numeric data/statistics – Laine Ruus • Aggregate statistics • Time-series statistics • Microdata • Spatial data – Marcel Fortin
Support for numeric data/statistics • Data Library Services (DLS) was created in 1988 • Objectives: • acquire, manage and preserve machine-readable data files needed to support empirical or statistical research and teaching activities of the University of Toronto, • provide access to machine-readable data files owned by the University of Toronto, • provide support for users of these machine-readable data files.
UT/DLS collections include • About 5,000 numeric, spatial and textual research data files, primarily but not exclusively in the social sciences • mainly quantitative research data, including microdata, aggregate data and time-series databases
Support for spatial data • GIS position established in 1999 • GIS & Map Library plus Data Library Services combined to form Map and Data Library in 2009
Three major types of quantitative data • Aggregate data (statistics) • Time-series aggregate data (statistics) • Microdata/transaction-level data, from which aggregate and time-series data are created
Service objectives • Access to data/statistics at point of need • Extraction of required data/statistics from larger databases • Manipulation: descriptive or inferential statistics, derived variables, etc • Display or download for further analysis
Aggregate statistics • Acquired from: • Government sources, eg Statistics Canada (various licences), such as census of population 1996 and later • Purchase ($$$) from various sources • Formats: • csv, MS Excel, Beyond 20/20: can be served from the www: • Run-time applications: cannot be served from the www
Aggregate statistics • Access • Html-based finding aids for those that can be served from the www (DDI compliant) eg http://www.chass.utoronto.ca/datalib/inventory/3000/3798.htm • Or download and install (zip format) eg http://www.chass.utoronto.ca/datalib/inventory/3000/3614.htm • CHASS (Computing in the Humanities and Social Sciences) is developing OLAP-based application for about 2000 census files pre-1996 that are only available in flat-ascii fixed-field format files, eg http://r1.chass.utoronto.ca/olap/
Time-series data • Acquired from • Statistics Canada (various licences) • IMF, UN, OECD, World Bank, commercial producers • Formats: • Complex hierarchical formats • Or run-time applications
Time-series data • Access: • Remote access, with producer interface (eg World Development Indicators, OECD, Datastream, etc.) • CHASS purpose-written interfaces: CANSIM, Canadian company balance sheet or stock-price data, trade data, etc., eghttp://dc1.chass.utoronto.ca/
Microdata Age Sex
Microdata • Acquired from • Statistics Canada (Data Liberation Initiative (DLI) licence): public use microdata files • ICPSR, Roper, etc. memberships • Some are free on the www, eg ICVS, PISA, Pew surveys • Formats: • Usually flat ascii fixed-field format • Or SPSS, SAS, or Nesstar formats
Microdata (continued) • Access: 3 major interfaces – Nesstar, SDA, VDC • Nesstar http://www.nesstar.com/ • Ontario universities: <odesi> project • Nesstar also used by many European data archives • Strong on documentation (DDI 2.0), weak on analysis
Microdata (continued) • SDA http://sda.berkeley.edu/ • Used by ICPSR, Roper Center, IPUMS, etc. • Strong on statistical analysis, weaker on documentation (DDI 2.0 compliant) • Analysis: frequencies, means, ANOVA, correlations, regressions (multiple, logit and probit) • Some graphic display • Design effects • Disclosure control • http://www.chass.utoronto.ca/datalib/misc/mun09/sda_compare.htm
Usage • DLS deals directly with approximately 2,300 –3,300 users per year • CHASS database usage: • 70,000 to 97,000 hits per year • Up to 58 subscribing universities in Canada and USA • SDA usage: • 858,595 hits in 9,944 visits in 2008 • 14 subscribing universities in Canada