230 likes | 453 Views
Basic Concept of Data Coding. Codes, Variables, and File Structures. Two Ways to Think About Coding. Coding “ON” the data source Use for unstructured narrative data in digital form Search for themes, key terms and mark on the text CAQDAS software helps manage the material
E N D
Basic Concept of Data Coding Codes, Variables, and File Structures
Two Ways to Think About Coding • Coding “ON” the data source • Use for unstructured narrative data in digital form • Search for themes, key terms and mark on the text • CAQDAS software helps manage the material • Coding “FROM” the data source • Use any data source in any form and any language • Create a database to collect what you find • Code what you need from the source into database • Manage and analyze the data in the database
Steps in Coding FROM Data Source • First think about: • How is your source organized in units? • What do you want to capture from the units? • Then create a structure to hold the data that: • Represents the units in your source • Contains places to put what you want to capture • Uses basic rules to keep data organized
Key Ideas • One ROW=One RECORD=One CASE • One FIELD=One COLUMN=One Variable • A flat file holds • records (rows) • fields (columns)
Flat File Rules 1. Each row or record of data needs a UNIQUE ID number 2. Each column or field holds ONE type of information. Do not try to put different things into one field. Why? 3. Data in one field can be plain text, numbers, or can have a systematic code What is the simplest possible code? 4. Quantitative analysis requires codes or numbers Can be counted and compared: variables ? ?
Flat File Structure Aids Analysis • Count # of cases of each category in one field • Cross-classify categories in two different fields • Plot one coded variable against another • Standardize raw numbers with percentages • Perform other forms of quantitative analysis
Three Kinds of Flat Files • Spreadsheet (Excel) • Statistical Program (SPSS, SAS, Stata) • Relational Database (Access) THEY LOOK SIMILAR BUT DO DIFFERENT THINGS
What Can You Do in Excel? • put data in rows and columns • enter text, numbers, dates, and formulas • add numbers in column or row (VALUES) • enter foreign language text • make charts from columns of data • import and export data in flat file format
What Are Limitations of Excel? • row are not stable (oriented to CELLS, not ROWS) • difficult to sort, count, manipulate RECORDS • repeat all data entry for each row (but can fill) • spelling errors in entry limit finding and sorting • flat file format itself has limitations for some data • what if there are multiple instances for one case?
What Can You do in SPSS? • put data in rows that are stable as records • primarily useful for numbers and codes • can separately define and label the codes • can count frequencies, do crosstabs, % • can collapse or combine codes • can do statistical analyses
Limitations of SPSS Flat Files • need to pre-code data into numeric codes • need to repeat all code fields for each record • problems handling multiple instances per case • what if code cannot be developed yet? • what if actual words need to be preserved? • what if code needs to expand later?
What Can Relational Database Do? • create stable records as rows • handles numbers, words, dates, notes • handles foreign languages • define data types to reduce errors, standardize • LINK different files in one-to-many relations • simplifies data entry to avoid repeated entry • can preserve words and develop codes later • use lookup tables to standardize codes • Create forms to simplify data entry • Use queries and reports to extract data
Solving Limitations in Access • create frequencies and crosstabs with % • use queries for quick and dirty counts • export flat file to SPSS • make pretty charts to display data • export to Excel • export to SPSS • Do statistical analysis • export to SPSS EXPORT AND IMPORT TABLES OR QUERIES
Get Started with a Test Sample • find out what is POSSIBLE in your data • what content does it contain? • what questions could you answer with it? • how can you extract relevant content? • how much effort does it take? • start with a few cases of the text data
Developing Coding Scheme • Think about data source as set of records • Think about different pieces of information • Think about appropriate way to code each • Think about whether data are multilevel • Work interactively with your data • Mistakes are fixable at this stage
A Code is a List of Categories • Divides up content in a systematic, meaningful way • Gender=Male vs. Female • Fruit=Apples, Oranges, Pears, Bananas, Other • May assign numbers to the categories • Such numbers do not have NUMERIC meaning • They simply refer to the different categories • Coding means assigning content to categories • A data field with coded categories is a “variable” • Provides a systematic basis for analysis
Three Ways to Code “Content” • 1. Each item is a separate field and is coded present or absent in every record. • 2. Various mutually exclusive options are coded in one field. Each record has one code category. • 3.Use a sub-table to collect multiple instances that occur in one record; code in sub-table (requires a relational database)
Code What is There • Some data will be missing—too bad • Resist temptation to code only judgments • Code the evidence into database • Then code your judgment (positive, negative) • This provides evidence for the judgment • Allows for reliability checks of judgments • Can start with some standard codes, add more later • Can enter actual terms, recode later
Content Coding Questions • How would you code Male and Female? • How would you code a word or phrase? • What if you don’t know all the words now? • What if there can be more than one/record? • How would you code a topic or theme? • What if you don’t know all the topics now? • What if there can be more than one/record?
Content Coding Questions • How would you code Male and Female? • How would you code key words or phrases? • What if you don’t know all the words now? • What if there can be more than one per record? • How would you code a topic or theme? • What if you don’t know all the topics now? • What if there can be more than one per record? ? ? ?