580 likes | 600 Views
Learn about the Data Liberation Initiative's licensing terms for educational purposes, access to Stats Canada data, and product details for research and teaching.
E N D
Data Liberation Initiative OVERVIEW OF THE DATA LIBERATION:Licence, Products, & Services Mike Sivyer Ontario DLI Training, April 5, 2004
Data Liberation Initiative Introduction • The DLI is a partnership between Statistics Canada and participating Canadian post secondary institutions • There are 66 participating institutions • Data are made available on a subscription basis • Major activities and direction of the project are guided by members themselves through the External Advisory Committee
Data Liberation Initiative The Licence • All member institutions must sign a data use licence agreement when joining the project • Under this licence data are made available for: • Teaching • Planning of academic/educational services • Academic Research and Publishing • Use of data in textbooks falls under a different set of STC licences and permissions
Data Liberation Initiative The Licence • Data are made available to educators, students and other institutional staff while they have such status at the institution • E.g.. A student who goes to USA to do Masters no longer has access to DLI data • Data are not to be used in any commercial or private activities (even if no $$ involved)
Data Liberation Initiative The Licence • There are also conditions indicating what must be done with any data if an institutions leaves the program • All use of data obtained must cease • They must destroy, or return all data products obtained while a member • Provide Stats Canada with assurances that this has happened
Data Liberation Initiative The Licence • A copy of the Data Use Agreement can be found on the DLI web site • DLI Contact is responsible to ensure eligible use of data • There are specific criteria that must be met in order to determine eligible use of DLI data
Data Liberation Initiative The Licence • Other questions to help determine if use falls under definition of academic research • If for publishing - is use strictly for publishing in academic or scholarly journal? • Is use under a joint project with outside agency/organization? - Any $$ involved?
Data Liberation Initiative The Licence • Did money come through institution’s “grants dept”? • Even if no $$ involved did research project come through regular institutional channels? • Are data expected to be shared with outside agency/organization?
Data Liberation Initiative The Licence • Other important elements of the Licence Agreement: • Data & products offered “as is “ • STC remains owner of intellectual property - only access to data is provided • Users must not link data or otherwise try to identify individual respondents
Data Liberation Initiative The Licence • DLI Contact to implement data security measures • May request that users sign agreement before allowing access • If unsure of eligibility send message to Team for consideration • All questions reviewed by DLI Manager & Director as well as Co-Chairs of EAC
Data Liberation Initiative The Products • DLI provides access to Stats Canada data produced as standard electronic productsavailable to the public • These products can be found in Stats Canada’s On-Line Catalogue of Products and Services • There is usually a flag indicating if a specific product is available to the DLI members
Data Liberation Initiative The Products • What is a standard electronic product? • An “off the shelf ” electronic product available to the public • Not included are standard publications available in electronic form as these are usually part of DSP
Data Liberation Initiative The Products • These data are digitally encoded and stored in a file structure. These include • Public Use Micro Data Files (PUMFs) • Census/Geography Files • Databases • The main focus of our collection are the public use microdata files
Data Liberation Initiative The Products • These are files of RAW DATA that have been anonomized and organized in a file where the records in the file represent the responses to survey questions of each individual respondent • Need metadata and software to read and understand the data
Data Liberation Initiative The Products • Data files can contain <10,000 to 50,000+ records • Records can contain <50 to 1,000+ variables • Can be as few as 50 to over 2000 bytes in length • Documentation can consist of 50 to 600+ pages
Data Liberation Initiative The Products • Need Codebook, Record Layout and other documentation to be able to manipulate data with a statistical software package such as SAS, SPSS, etc • Following are examples of Codebook, and Record Layout
Data Liberation Initiative The Products • DLI Collection also contains some products that contain aggregated data in table format • Main focus of DLI Collection on Socio-Economic data: • Health • Education, Literacy • Labour Market, Income • Travel • Justice • Census, Demographic • Etc.
Data Liberation Initiative The Products • Data products supplied by the social side of Stats Canada can be : • raw data in the form of public use microdata files, • aggregate data in the form of Beyond 20/20 tables, etc. • We have only a few products supplied by the business side of Stats Canada
Data Liberation Initiative The Products • These surveys do not produced public use microdata files as a standard electronic product • This is because most of these surveys are a “census” of the target population and there are confidentiality issues • DLI does include some business products such as: • Trade data • Financial Performance Indicators CD • Inter-Corporate Ownership
Data Liberation Initiative The Products • There are currently over 20,000 files available in the DLI Collection • These include : • Data files • Metadata • Census & Geography • CD’s
Data Liberation Initiative The Products • New data products continually being added to Collection • Includes: • Updated data from regular on-going surveys • New data from ad-hoc special surveys (one time only) • Data from new surveys in STC program (on-going)
Data Liberation Initiative The Products • Updates may be provided in different format than earlier version: • For example PUMF Beyond 20/20 Tables • As new versions are received have to decide to either replace data or add to Collection
Data Liberation Initiative The Products • Not all products in DLI Collection are standard STC electronic products • For example we have the KLEMSdatabase • An experimental database of productivity data • We also have data from the Dept. Of Fisheries and Oceans
Data Liberation Initiative The Services • DLI was conceived to be a internet based means of dissemination • The internet is the main mode of data transfer and communications • DLI offers both anFTP and a Web based service for access to Collection
Data Liberation Initiative The Services • Our FTP site is considered to be the main repository for our collection where DLI Contacts download data products • The FTP site is only open to DLI Contacts • The following is an example of the FTP file structure
Data Liberation Initiative The Services • Access to the data and metadata of many of our titles can also be achieved via our Web pages • While the data files are locked and available only to DLI Contacts the metadata files are available to all • The following are examples of the different parts of a small and a large survey
How large are files? Data Liberation Initiative
Data Liberation Initiative The Services • The internet is also used for communication between and among the members and to order products that are available in hardcopy only • DLILIST - forum for making enquires, sharing of information and general communication between and among members • DLIORDER & WWW DLI ORDER DESK - to order hard copy versions of products not available electronically
Data Liberation Initiative The Services • Our web site not only provides access to the data and metadata but also contains a lot of other information and valuable links
Data Liberation Initiative The Services • Another service is the production of the DLI-Update • A newsletter designed as a means to inform, teach and share information • Articles are written by various DLI Contacts and Team members • Back issues can be found on the web site
Data Liberation Initiative The Services • When a product is received by Team a number of steps are performed before it is placed in the Collection: • First of all we need to check to ensure that all files - data, metadata (French & English) have been received • Open each file to ensure it is what it says it is (e.g if a .DOC then file is a WORD file, etc)
Data Liberation Initiative The Services • Run program against data file to verify: • Number of records • Record length • Overall size of file • Compare results against codebook and/or record layout
Data Liberation Initiative The Services • If SAS and/or SPSS received run against file • If no SPSS - create it • Rename all files to conform to DLI standards • Create FTP path & directories • Create Web pages
Data Liberation Initiative The Services • Load all files into appropriate places on FTP and Web • Announce addition on DLILIST
Data Liberation Initiative The Services • Many files have not come with SPSS descriptions - these are created by DLI Team • Often older files do not have French versions of documentation so extremely difficult to create French SPSS • Creation of these SPSS labels can take some time after receipt of documentation, depending on workload, size of file, and if any documentation in electronic format
Data Liberation Initiative The Services • We are starting to receive some kind of SPSS descriptors from author divisions • If and when SPSS supplied by author division they can require major editing to fit with “DLI Users” requirements (e.g. length of variable and value labels) • The preparation and/or verification of SPSS syntax is a major undertaking
Data Liberation Initiative The Services • Who does all this work? • There Team of people situated in the Stats Can Library • They are:
Data Liberation Initiative The Services Jackie Godfrey • Responsible for: • Project on-line infrastructure • Data security i.e.. Passwords, IP validation, etc • Listservs • etc
Data Liberation Initiative The Services Sage Cram • Responsible for: • Communications • Responding to question on DLILIST • Liasion between DLI members and STC divisions
Data Liberation Initiative The Services André Blondin • Responsible for: • Quality Control of data and metadata • Maintenance of FTP site directories • Loading of files on FTP site • Overseeing creation of SPSS