1 / 46

Chapter 8 Coding

Chapter 8 Coding. Course: Quantitative Research Group members: Catherine, Rainie, Louis. Outline. Preparing data for coding Transcribing oral data Transcription Conventions Transcription Machines Technology and Transcription Data Coding Nominal Data Ordinal Data

penney
Download Presentation

Chapter 8 Coding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8 Coding Course: Quantitative Research Group members: Catherine, Rainie, Louis

  2. Outline • Preparing data for coding • Transcribing oral data Transcription Conventions Transcription Machines Technology and Transcription • Data Coding Nominal Data Ordinal Data Interval Data

  3. Preparing data for coding Once data are collected, it is necessary to organize and analyze them…. • Excel, SPSS, SAS,JMP…etc. • Digital form (numbers) Coding involves making decisions: • Classify or categorize data raw data well-organized data oral datacoding form of essays test scores diaries checkmarks on observation schemes transcribe

  4. transcription conventionsTranscribing oral data transcription machine Transcriptions conventions • facilitate the oral data in a written format • useful for coding & for providing examples Notations in transcripts: • Italics • [ brackets ] • ( parentheses ) • … sequence of dots • Boldface • “ Quotation marks”

  5. Example of transcription conventionsp.224 Emma: uh HONEY I'LL PRAY FOR EV'RYBODY= Lottie: [=Alri:ght,] Emma: [=I:- I ] don't kno:w,hh this: [uh w]orld goes o::n= Lottie: [Yeh.] Emma: =we have to keep ¯ goin' do[n't we.] Lottie: [Ye:ah, ] (.) Lottie: [U h h u h ] Emma: [D'you feel h]a:ppy toda:y? (0.4) Lottie: ­ Ye:ah. Emma: Good. (.) http://www.lboro.ac.uk/departments/ss/JP-docs/Transcription%20conventions.htm

  6. Transcription machines foot pedal • become easier to transcribe headphones • rewind tapes automatically Technology and Transcription (digital recording equipment) • More reasonably priced and accessible • Automate the bulk of transcription task ✖ nonnative accents can’t handle

  7. Nominal data • Often used for classifying categorical data • Numerical values Dichotomous variables gender- 1:male 2: female Nondichotomous variables language- 1: Chinese 2: English 3: Spanish

  8. Ordinal data • Often used for ranking data • Indicate two students are close together Advantages of rank groups group • Some data can be discount top 25% (middle range score) bottom 25 %

  9. Interval data • Often used for ranking data and indicating the distance • Pay attention to impact→ -10 -10

  10. Outline • Coding System - Common coding systems and categories (1) T-units (2) Suppliance in Obligatory Contexts (3) CHAT - Custom-Made Coding System (1) Question Formation (2) Negative Feedback (3) Classroom Interaction (4) Second Language Writing Instruction

  11. 8.3 Coding System • Bernard (1995) the general principle highest level of measurement that you can” • The data: can be collapsed into a broader level of coding. • The categories in coding should be narrow as possible.

  12. Different coding practices can be used with 2nd language data → deeper understanding of the information they’ve collected. • Coding systems are often referred to as (1)(3) (2) (4) • Researchers develop a coding scheme based on their specific research questions. • It would be helpful: use more existing coding schemes facilitate easy comparison across studies. techniques sheets charts schemes

  13. Existing schemes require: (1) refinements to capture new knowledge (2) new schemes are required • Coding systems range from those based on standard measures. • Recognize: it’s impossible to cover the whole range of existing schemes.

  14. 8.3.1 Common Coding Systems &Categories • A number of coding systems for oral & written data include: • T-units • Suppliance in obligatory context (SOC)counts • CHAT conversation • Turns • Utterances • Sentences • Communication units • Tone units • Analysis of speech units • Idea units • Clauses • S-nodes per sentence • Type-token ratios • Targetlike usage counts

  15. 8.3.1.1 T-Units • Definition: “one main clause with all subordinate clauses attached to it. • Originally used: measure syntactic development in children’s L1 writing • Later used: a common measurement in second language research as well.

  16. Example of a T-unit: After she had eaten, Sally went to the park ↗ subordinate clause↗ main clause error-free no nontargetlike language. • Another alternative T-unit example: After eat, Peter go to bed →contain error↗ main clause contains an error.

  17. To code using T-units: (1) go through an essay or a transcription (2) count the total number of T-units. • From this number, the researcher could count all the total number of T-units not containing any errors & then present a ratio. • T-units have been used as : (1) a measurement of linguistic complexity (2) a measurement of accuracy.

  18. 8.3.1.2 Suppliance in obligatory context (SOC) counts • The learners’ level of a acquisition can be measured in terms of how often these features are supplied where they are required. And this known as SOC. EX: He is singing right now. → because this is a context in which the progressive form is obligatory. the –ing is required

  19. SOC was first used: grammatical morphemes (1st language) later applied: in 2nd language • Although SOC is useful in morpheme use Pica’s (1984) study criticized that. • account for learners’ use of morphemes in inappropriate contexts. Pica used target-like usage (TLU): It takes into account (1) appropriate (2) inappropriate contexts. Did not

  20. 8.3.1.3 CHAT • CHAT was developed as a tool for the study of first and second language acquisition as part of the Child Language Data Exchange System (CHILDS) database.

  21. CHAT has become an common system for: (1) the coding of conversational interactions (2) employs detailed conversations • CHAT is particular useful in qualitative research. • The goal for researchers: to ascertain how best to investigate one’s own research questions. • In much 2nd language research, preexisting coding systems and categories are the exception rather than the rule.

  22. 8.3.2 Custom-Made Coding System 8.3.2.1 Question Formation • The researchers needed a coding scheme: allow them to identify how the learners’ question formation changed over time. • To code the data, Mackey & Philp designated the questions produced by their child learners as belonging to one of the six stages based on the Pienemann-John hierarchy. The modified version is on Table 8.6.

  23. Coding for Questions: Tentative Stages for Question Formation Goal: to capture processing capabilities & developing linguistic complexity

  24. determine the highest level stage • After the stages, the next step: • The next step of the coding involved the assignment of an overall stage to each learner, based on two highest-level question forms asked in two different tests. • It was then possible to examine whether the learners had improved over time.

  25. Table 8.7 Coding for Question Stage ID PretestImmediate PosttestDelayed Posttest Task Task Task FinalTask Task Task FinalTask Task Task Final 1 2 3 Stage1 2 3 Stage1 2 3 Stage AB 3 3 2 3 3 3 3 3 33 2 3 AA 3 3 3 3 5 5 4 5 5 5 4 5 AC 3 4 3 3 2 2 3 2 3 3 3 3 AD 3 3 4 4 3 5 5 5 5 3 3 3 • Learner AB continues throughout the study at the third stage. • Learner AA began the study at stage 3 & continued through the next three posttest at Stage 5. • Once this sort of coding has been carried out, the researcher can make decisions about the analysis.

  26. 8.3.2.2 Negative Feedback • Oliver developed a hierarchical coding system for analysis that first divided all teacher-student and NS-NNS conversations into three parts: →Native Speaker – Nonnative Speaker (1) NNS’s initial turn (2) the response given by the teacher of NS partner (3) the NNS’ reaction → each part was subjected to further coding.

  27. Figure 8.1 Three-turn coding scheme rated as Initial Turn→Correct Non-target Incomplete ↙↓ ↘ NS Response→Ignore Negative Feedback Continue ↙↓↘ NNS Response→Response Ignore No Chance • As with many schemes, this one is top-down, known as hierarchical, & the categories are mutually exclusive.→ meaning that it is possible to code each piece of data in only one way.

  28. 8.3.2.3 Classroom Interaction • Next turn was examined to determine : (1) whether the error was occurred (2) whether it was ignored • If the error was corrected, the following turn was examined and coded according to (1) whether the learner produced uptake (2) whether the topic was continued. • Finally, the talk following uptake was examined with regard to (1) whether the uptake was reinforced (2) the topic or the topic continued.

  29. Second language writing instruction • Two studies used coding categories: (1) Adams (2003): → investigate the effects of written error correction on learners’ subsequent 2nd language writing (2) Sachs & Polis (2004) → compared three feedback conditions

  30. The researchers used different coding schemes to fit the question to compare the four feedback conditions with each other. (1) original error (s) (+) (2) completely corrected (0) (3) completely unchanged (-) (4) not applicable (n/a) • Adams coded individual forms as: (1) more targetlike (2) not more targetlike (3) not attempted (avoided) • Sachs & Polio considered T-unit codings of “at least partially changed” (+) to be possible evidence of noticing even when the forms were not completely more targetlike.

  31. Outline • Task planning • Coding Qualitative Data • Interrater Reliability • The mechanics of coding • Conclusion

  32. 8.3.2.5. Task planning • The effects of planning on task performance (fluency, accuracy, and complexity.) • Yuan and Ellis (2003): Through operationalization (1)Fluency: (a) number of syllables per minute, and (b) number of meaningful syllables per minute, where repeated or reformulated syllables were not counted. (2) Complexity: syntactic complexity, the ratio of clauses to t-units; syntactic variety, the total number of different grammatical verb forms used; and mean segmental type-token ration. (3) Accuracy: the percentage of error-free clauses, and correct verb forms (the percentage of accurately used verb forms). • Benefit of a coding system: is similar enough to those used in previous studies that results are comparable, while also finely grained enough to capture new information.

  33. 8.3.3 Coding qualitative data(1) • The schemes for qualitative coding generally emerge from the data (open coding). • The range of variation within individual categories: can assist in the procedure of adapting and finalizingthe coding system, with the goal of closely reflecting and representing the data • Examining the data for emergent patterns and themes, by looking for anything pertinent to the research question or problem • New insights and observations that are not derived from the research question or literature review may important.

  34. 8.3.3 Coding qualitative data(2) • Themes and topics should emerge from the first round of insights into the data, when the researcher begins to consider what chunks of data fit together, and which, if any, are independent categories. • Problem: With developing highly specific coding schemes, it can be problematic to compare qualitative coding and results across studies and contexts. • Watson-Gegeo (1988): “Although it may not be possible to compare coding between settings on a surface level, it may still be possible to do so on an abstract level.”

  35. 8.4. Interrater reliability(1) • Reliability of a test or measurement based on the degree of similarity of results obtained from different researchers using the same equipment and method. If interrater reliability is high, results will be very similar. • Only one coder and no intracoder reliability measures, the reader’s confidence in the conclusions of the study may be undermined. • To increase confidence: (1)More than one rater code the data wherever possible (2)Carefully select and train the raters • Keep coders selectively blind about what part of the data or for which group they are coding, in order to reduce the possibility of inadvertent coder biases.

  36. 8.4. Interrater reliability(2) • To increase rater reliability: to schedule coding in rounds or trials to reduce boredom or drift • How much data should be coded: as much as is feasible give the time and resources available for the study • Consider the nature of the coding scheme in determining how much data should be coded by a second rater • With highly objective, low-inference coding schemes, it is possible to establish confidence in rater reliability with as little as 10% of the data

  37. 8.4.1.1. Simple percentage agreement • This is the ratio of all coding agreements over the total number of coding decisions made by the coders (appropriate for continuous data). • The drawback: to ignore the possibility that some of the agreement may have occurred by chance

  38. 8.4.1.2. Cohen’s kappa • This statistic represents the average rate of agreement for an entire set of scores, accounting for the frequency of both agreements and disagreements by category. • In a dichotomous coding scheme ( like targetlike or nontargetlike): (1)First coder: targetlike, nontargetlike (2)Second coder: targetlike, nontargetlike (3)First and Second coders: targetlike • It also accounts for chance.

  39. 8.4.1.3. Additional measures of reliability • Pearson’s Product Moment or Spearman Rank Correlation Coefficients: are based on measures of correlation and reflect the degree of association between the ratings provided by two raters.

  40. 8.4.1.4. Good practice guidelines for interrater reliability • “There is no well-developed framework for choosing appropriate reliability measures.” (Rust&Cooil 1994) • General good practice guidelines suggest that researchers should state: (1)Which measure was used to calculate interrater reliability (2)What the score was (3)Briefly explain why that particular measure was chosen.

  41. 8.4.1.5 How data are selected for interrater reliability tests • Semi-randomly select a portion of the data (say 25%), then coded by a second rater • To create comprehensive datasets for random selection of the 25% from different parts of the main dataset • If a pretest and three posttests are used, data from each of them should be included in the 25%. • Intrarater reliability refers to whether a rater will assign the same score after a set time period.

  42. 8.4.1.6. When to carry out coding reliability checks • To use a sample dataset to train themselves and their other coders, and test out their coding scheme early on in the coding process • The following reporting on coding: (1)What measure was used (2)The amount of data coded (3)Number of raters employed (4)Rationale for choosing the measurement used (5)Interrater reliability statistics (6)What happened to data about which there was disagreement • Complete reporting will help the researcher provide a solid foundation for the claims made in the study, and will also facilitate the process of replicating studies.

  43. 8.5. The mechanics of coding • (1)Using highlighting pens, working directly on transcripts. • (2)Listening to tapes or watching videotapes without transcribing everything: May simply mark coding sheets, when the phenomena researchers are interested in occur. • (3)Using computer programs (CALL programs).

  44. 8.5.1. How much to code • (1)Consider and justify why they are not coding all their data. • (2)Determining how much of the data to code. ( data sampling or data segmentation) • (3)The data must be representative of the dataset as a whole and should also be appropriate for comparisons if these are being made. • (4)The research questions should ultimately drive the decisions made, and to specify principled reasons for selecting data to code.

  45. 8.5.2 When to make coding decisions • How to code and who much to code prior to the data collection process • Carrying out an adequate pilot study: This will allow for piloting not only of materials and methods, but also of coding and analysis. • The most effective way to avoid potential problems: Designing coding sheets ahead of data collection and then testing them out in a pilot study

  46. 8.6. Conclusion • Many of processes involved in data coding can be thought through ahead of time and then pilot tested. • These include the preparation of raw data for coding, transcription, the modification or creation of appropriate coding systems, and the plan for determining reliability.

More Related