110 likes | 207 Views
Quantifying Data Advanced Social Research (soci5013). Peter Njuguna Source: Course Pack Chapter 14 (Page 383 – 395). Overview. Data Analysis Statistical Quantitative Mostly computer aided nowadays Process Mass observations Quantification (through coding)
E N D
Quantifying DataAdvanced Social Research (soci5013) Peter Njuguna Source: Course Pack Chapter 14 (Page 383 – 395)
Overview • Data Analysis • Statistical • Quantitative • Mostly computer aided nowadays • Process • Mass observations • Quantification (through coding) • Coding error reduction (Data cleaning) • Data Analysis
Introduction • Social Science data (largely non-numeric) • Machine Readability, Manipulation • Logic of data manipulation in quantitative analysis • Biological & Physical science data (mostly numeric attributes, eg counts, pH, length, temp.,..) • Baseline: The logic remains same even with development of more powerful technology • Computers are tools to enhance research. They understand only the basics
Computers in Social Research • France (1801) Joseph marie-Jacquard (automatic loom, punched cards, weaving patterns) • USA (1790) 10-year census – under 4 mil. People • 1880 Over 62 million. (9 years to tabulate!) • 1890 Herman Hollerith: Punched card system (Results reported in 6 weeks) • Tabulating Machine Co. + mergers = IBM • Baseline: Information coding, storage, Retrieval. • Today’s computer data analysis: Converting observations into machine readable form, electronic data storage, retrieval, manipulation and presentation • Statistical Analysis (Some programs specific for social Science eg SPSS)
Coding for Quantitative Analysis • Social science methods (interviews, questionnaires, .) • Open-ended & closed-ended questions : Non-numeric responses • Coding reduces responses to limited set of attributes to enable analysis • use pre-established coding: Comparable with others • coding from the data set (responses): Flexibility response coverage • Coding system should be appropriate to theoretical concepts • If data coded to maintain detail, can be combined where detail not necessary, but not vice versa
Developing code categories • Well developed coding scheme • Derived from research purpose • Existing coding scheme (comparable) • Generate codes from your data • Many possible schemes (cf. pg 388, 389), specific to your research purpose • Review for recoding as you progress • Code categories should be; • Exhaustive • Mutually exclusive • Coder reliability (including yourself) crucial
Codebook construction • Codebook (describes location of variables; assignment of codes to attributes) • Primary guide in coding process • Guide for locating variables & interpreting codes in data file during analysis • Contains • Variable names, • Full descriptions (cf. exact wording of questions) • Categorized response options
Coding and data entry options (1) • Transfer sheets • Useful technique especially with complex questionnaires and other data sources • Source Course pack pg 391
Coding and data entry options (2) • Edge-coding • Direct data entry (pre-coded questionnaires) • Data entry by interviewers • e.g. CATIs • Closed-ended data ready for analysis • Open-ended responses - additional coding step before analysis • Coding to optical scan sheets • Coder error high • Low scanner tolerance • Direct coding on op sheets by respondent • Connecting with data analysis program • eg SPSS – blank data sheets – entry – analysis • Create data set (spreadsheet, etc) – import & export • Compatibility options well developed
Screening and elimination of errors (Data cleaning) • Errors almost inevitable • Incorrect coding • Incorrect reading of codes • Sensing of marks, etc • Two types of data cleaning methods • Possible code cleaning • By checking for errors as data is entered (“beep!”) • Testing for illegitimate codes in stored data files • Contingency cleaning • That only cases relevant to attribute have such entries (cf. No of pregnancies in men) inappropriate. • Can be ignored sometimes (significance, discretion) • Remember that “dirty” data almost always produces misleading results ….
AT LONG LAST, …. YOUR DATA IS READY FOR ANALYSIS … GO!