1 / 17

Research data workflow

Research data workflow. Practice in Slovenian Social Science Data Archives. SERSCIDA WP4 – WORKSHOP Ljubljana S ept ember 2013. SIP, AIP, DIP. Submission Information Package (SIP ) Archival Information Package (AIP ) Dissemination Information Package (DIP ). DIP. SIP. AIP. AIP.

lilka
Download Presentation

Research data workflow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013

  2. SIP, AIP, DIP • Submission Information Package (SIP) • Archival Information Package (AIP) • Dissemination Information Package (DIP) DIP SIP AIP AIP Long term preservation

  3. Recommended formats – input

  4. Recommended formats– distribution • STUDY DESCIPTION: DDI structuredXML • DATA FILE: ASCII + xml distributed in formats that can be exported from Nesstar • OTHER TEXTUAL MATERIAL: PDF

  5. Recommended formats– archiving • DATA FILE: ASCII (*.txt) + xmlwith DDI file anddatadescription

  6. Recommended formats – archiving • QUESTIONNAIRE, TEXT MATERIAL: original (any format) + distributionfiles(PDF) • STUDY DESCRIPTION: DDI structuredXML

  7. Licence Agreement Free: • to Share — to copy, distribute and transmit the work • to Remix — to adapt the work • to make commercial use of the work Underthefollowingconditions: Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Free: • to Share — to copy, distribute and transmit the work • to Remix — to adapt the work Underthefollowingconditions: Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Noncommercial — You may not use this work for commercial purposes.

  8. Naming files and versioning File format: StudyID_MaterialType_Language_Version_Subversion.FileFormat Example: sutr1006_p1_sl_v1_r2.txt URN: URN:SI:UNI-LJ-FDV:ADP:StudyID_MaterialType_Language_Version Example: URN:SI:UNI-LJ-FDV:ADP:sutr1006_p1_sl_v1

  9. Managingworkflow • Project tracking software • Task for every study, with 29 subtasks covering: • general part withemail correspondence • managing deposited materials • preparing data file • preparing study description • publishing http://nesstar2.adp.fdv.uni-lj.si:8080/browse/RAZ-4536

  10. Cleaning data and documentation • Frequencies check • Variablenames, values • Missing values • Recode • Weight • Anonymisation • Cumulativedataset

  11. Anonymisation Sebastian Kočar ExpertAssistant in Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013

  12. Anonymisation in thearchives - types • basic anonymisation - of mostly academic research dataset • anonymisationof Eurostat files • anonymisationof official statistics Public Use Files (PUF)

  13. Basic anonymisationofdistributed microdata in archives • deleting variables Directidentifiers (telephone numbers, addresses etc.) are removed. • recoding indirect identifiers But still allowing serious researchers to receive datasets with indirect identifiers non-recoded). Recodingincludesremoving valuesandbracketing – combining the categories of a variable.

  14. Anonymisation of Eurostat files (the case of EurostatLaborForceSurvey) • deleting variables: indirect identifiers and unneeded variables are removed (municipality, wave nr. etc.) • bracketing: age, marital status, education, years of residence, age of establishment of residence, duration of search of employment, professional status, country & nationality • classification: income numbers are not given, respondents are divided into classes based on their income • aggregation: economic activity and occupation values are aggregated at 1-digit level • top-coding: restricting the upper range of a variable (nr. of hours worked)

  15. Anonymisation of official statistics Public Use Files for distribution in archives • anonymisation software: μArgus, R! (sdcMicro, bethel, sampling packages), Cornell anonymisation toolkit, synthetic data generators • anonymisation technics: data reduction techniques (global coding, local suppression etc.), data perturbation techniques (micro-aggregation, PRAM etc.), sampling, generating synthetic microdata

  16. Anonymisation – a casestudy • PUF preparedin cooperation with SORS Sector for General Methodology and Standards • anonymisation procedure which follows Eurostat LFS anonymisation criteria (in SPSS) • calculating individual and global risk (R! – sdcMicro) • calculating strata allocation, based on individual risk averages by strata (R! – bethel) • stratified sampling, based on the inclusion probability of a certain case(R! – sampling – samplecube) • sampleweights recalculation • LFS 2010 PUF distributed in August 2013

More Related