500 likes | 509 Views
Learn about transcription and coding in this seminar, exploring the key concepts, differences between spoken and written language, and the structure of CHAT. Dive into the process and tools used for transcribing and coding language-based analysis methods.
E N D
Critical Issues in Information Systems BUSS 951 Seminar 10 Transcription & Coding
Transcribing & Coding • transcription and codingis a major requirement for language based methods of analysis • transcription- convertion of speech to writing • coding- is the addition of relevant information to the transcription • needed because spoken and written language are very different
+ interactive 2 or more participants + face-to-face in the same place and time + language as action using language to accomplish some task + spontaneous without rehearsing what is going to be said + casual informal and everyday - interactive one participant - face-to-face on his or her own - language as action using language to reflect - spontaneous planning, drafting and rewriting - casual formal and special occasions Speech is not WritingDifferences in Spoken & Written Texts
Transcribing & Coding iterate until the text is transcribed and coded Transcribe Seek to Lead-in Zone Playback Coding cue the tape (rewind and fast forward) until you get to the part of the tape you are seeking
CHAT • one of the best standards is CHAT- Codes for the Human Analysis of Transcripts • well defined standard • even in research literature, transcriptions are often ad hoc & idiosyncratic • formal standards are difficult to obtain
CHAT • developed for subsequent computer processing in mind • suite of programs is available called CLAN to parse the text • excellent provision for creating transcripts even when the text is difficult to understand • speaker has an accent or has a speech problem
CHAT • standard is extensible; provides a consistent way of adding new headers if necessary • developed by Brain MacWhinney and Jane Walter at the CHILDES- Child Language Data Exchange Research Centre Department of Psychology, Carnegie Mellon University
CHAT Structure • CHAT has a basic structure common to all transcripts • a block of so-called Constant Headersat the top of the transcriptstarting with an @Begin • the body of the transcriptconsisting of turns taken by speakers called Mainlines, followed by zero through to many Dependent Tiers • a single command which is used to signal the end of the transcript, @End
CHAT StructureTop of the Transcript (1) • the top of any transcript always has two compulsory commands: @Begin @Participants: MCL MicroLabs Assistant, STU Student • @Beginindicates the start of the transcript. It must always be the first line of any CHAT transcript. It does not include any other information...
CHAT StructureTop of the Transcript (2) • @Participantsspecifies is a mandatory ConstantHeader- a command only used once per transcript- which lists the interactants in the transcript. The syntax as with all transcripts is critical. • the three letter codes after the header indicate a person who speaks or is other wise involved with the text • the string after the three letter code explains the role of that participant in the text
CHAT StructureTop of the Transcript (3) • below the @Begin and @Participants can be listed other optional constant headers including @Age of, @Sex of, @SES of @Age of MCL: 35 @SES of MCL: middle @Sex of MCL: male
CHAT StructureTop of the Transcript (4) • optional Constant Headersmust follow the @Participants header because they need to refer to the three letter participant identifier • whether you include them will depend on if they are significant: is the age of a participant important in the text? • a complete list follows...
Table 1 : CHAT Constant Headers. CHAT Constant Headers. Constant Headers that have proved to be useful in workplace language studies (Clarke 1996b, 1996c) are presented against a white background while less relevant Constant Headers are presented against a shaded background. @Begin indicates the start of CHAT file @Participants: list of actors in file @Age of XXX: speakers age in yymmdd format @Birth of XXX: date of birth of speaker @SES of XXX: socio-economic status of speaker @Education of XXX: speakers education in years @Sex of XXX: indicates gender of the speaker @Filename: name of transcription data file @Coding: version of CHAT being used @Warning: relative completeness of the transcript @End indicates the end of CHAT file
CHAT StructureTop of the Transcript (6) • the CHAT Constant Headerscan also be represented using a syntax diagram, which are also used for describing the syntax rules for computer languages like Pascal • a diagram follows...
Figure 3 : CHAT Constant Headers Syntax Diagram
CHAT StructureTop of the Transcript (8) • Completed transcript so far... @Begin @Participants: MCL MicroLabs Assistant, STU Student @Age of MCL: 35 @SES of MCL: middle @Sex of MCL: male @Age of STU: 18 @SES of STU: middle @Sex of STU: male
CHAT StructureTranscript Body (1) • most of the transcript body of mainlines which indicate that a participant is taking a turn in the conversation • other features are also found in the transcript body include: • Dependent Tierswhich are used to add special coding for a given turn • Changeable or Repeating Headers
CHAT StructureMainlines (1) • a mainline is a turn taken by a participant, indicated by an * • who takes a turn is indicated by one of the participant identifiers, listed in the @Participants constant header...
CHAT StructureMainlines (2) • the text comprising the speakers turn is transcribed after the * and participant identifier • an example of a completed mainline: *MCL what software do you want
CHAT StructureDependent Tiers (1) • Dependent Tiers are used to add extra detail • many different types of them • always relate only to a specific turn, and if necessary, are only ever listed below the mainline to which they refer
CHAT StructureDependent Tiers (2) • dependent tiers are identified in a transcript by the use of a % followed by the appropriate dependent tier code • the dependent tier code tells the reader what kind of information is being coded for the above mainline
CHAT StructureDependent Tiers (3) • an example showing a mainline and its two dependent tiers (%sit, %com) is provided below: *MCL what software do you want %sit STU and MCL are at the service desk %com STU looks like he is lost • a list of valid dependent tiers follows...
Table 4 : CHAT Dependent Tiers. Dependent Tiers that have proved to be useful in workplace language studies (Clarke 1996b, 1996c) are presented against a white background while less relevant Dependent Tiers are presented against a shaded background. %flo simplified flowing original %pho phonetic and phonemic transcription %par paralinguistic features %int intonation and prosody %lan code shifting into secondary language %act actions %fac facial actions %gpx gestures and proxemics %add addressee %sit situational coding %exp explanation %com comments by investigator/transcriber %alt alternative utterance %tim time stamp coding %spa speech act coding %mor morphemic semantics %phs phrase structure notation %err error coding %cod general purpose coding
CHAT StructureChangeable/Repeating Headers (1) • Repeating Headers can be inserted repeatedly in a transcript, but they are only used when a significant condition has changed • inserted in a transcript, a Repeating Header is valid for the remainder of the transcript, or until another Header of the same type overrides it
CHAT StructureChangeable/Repeating Headers (2) • a list of valid Changeable or Repeating Headers is provided on the next slide • just like the Constant Headers, Changeable or Repeating Headers can be described using a syntax diagram, which is on the slide following the list
CHAT StructureSummary...so far! • so far we have described three separate types of structure that occur within the body of a CHAT transcript: • Mainlines (for transcribing turns) • Dependent Tiers (for coding turns) • Changeable or Repeating Headers
CHAT StructureSpecial Mainline Codes (1) • sometimes it is important to add additional information into the mainline itself • NOTE the following about the body of the CHAT transcript: • an actual turn as shown in lower case on a mainline, and • that there is normally no punctuation on mainlines
CHAT StructureSpecial Mainline Codes (2) • this is because when punctuation is used it conforms to CHAT Special Mainline Codes • Special Mainline Codes occur in one of two types: • Utterance Junctures and Delimiters • Utterance Ambiguity Codes • we will describe both types in order...
CHAT StructureSpecial Mainline Codes (3) • Utterance Junctures and Delimiters- • indicate either junctures or brakes in the turn (pauses etc). These Special Mainline Codes are referred to as Utterance Internal Junctures • indicate how a turn was completed (as a question, the speaker was interrupted etc). These Special Mainline Codes are referred to as Post Utterance Delimiters
CHAT StructureSpecial Mainline Codes (4) • Utterance Junctures and Delimiters continued... • indicate how a turn was started, either by a participant taking up anothers talk (called latching), or by completing anothers talk (called completion). These Special Mainline Codes are referred to as Pre Utterance Delimiters • a list follows...
Utterance Junctures and Delimiters (a) Utterance Internal Junctures [#] Short Pause [#long] Long Pause [#ss.mm] Timed Pause , Comma (b) Post Utterance Delimiters . Period ? Question ! Exclamation [...] Trailing off [\] Interruption (c) Pre Utterance Delimiters [>] Latching [+] Completion
CHAT StructureSpecial Mainline Codes (6) • Utterance Ambiguity Codescan also be inserted into a mainline • used when there has been: • a problem with the transcription process, or • when an unusual condition occurs (when a gesture substitutes for a word) words used special coding is required...
CHAT StructureSpecial Mainline Codes (7) • Utterance Ambiguity Codesmay also be moved to their own dependent tiers if the mainline is getting cluttered up with coding • the table that follows shows the valid CHAT Utterance Ambiguity Codes ...
CHAT StructureBottom of the Transcript (1) • the only unique syntax for the bottom of the transcript is the @Endmandatory Constant Header • needed to indicate when a transcript is finished • a relatively complete transcript extract showing required features follows. NOTE that : is not part of the CHAT standard...
Tool Support(1) • the CHAT system has a number of tools available for it • one tool called CLAN consists of a parser for checking the syntax of CHAT transcripts • multimedia versions of CLAN are being developed; useful when meetings have been videotaped
Tool Support (2)Needed for Transcription NOT Coding • these tools are great for building elaborately coded transcripts • they are not so helpful when dealing with workplace language • coding is not the major problem- its transcription that takes the greatest effort in workplace language studies
Tool Support (3)Transcription • there are of course a number of transcription systems which when combined with CHAT and CLAN could form a useful workplace language system • but, the ‘State-of-the-Art’ still not very good
Tool Support (4)Speech Recognition? • some manufacturers claim to get 95% accuracy in transcription, but this is only possible under very constrained conditions: • these systems cannot handle speech which is continuous and flowing- the software cannot find where words start and end • these systems cannot transcribe speech unless the system has been trained to understand each and every speaker
Tool Support (5) • in some circumstances the inability of current systems to recognise Flowing Speechmay not be a great problem because workplace transcripts can be sparse • Some excellent system are becoming available eg./ Dragon DICTATE for Windows
Tool Support (6) • but, it has taken the IS Discipline 20 years to come up with reasonable CASE tools to support traditional systems development activities • we may need another 20 years to provide the same level of support for semio-informatics!