500 likes | 619 Views
Critical Issues in Information Systems. BUSS 951. Seminar 10 Transcription & Coding. Transcription & Coding An Introduction. Transcribing & Coding. transcription and coding is a major requirement for language based methods of analysis transcription - convertion of speech to writing
E N D
Critical Issues in Information Systems BUSS 951 Seminar 10 Transcription & Coding
Transcribing & Coding • transcription and codingis a major requirement for language based methods of analysis • transcription- convertion of speech to writing • coding- is the addition of relevant information to the transcription • needed because spoken and written language are very different
+ interactive 2 or more participants + face-to-face in the same place and time + language as action using language to accomplish some task + spontaneous without rehearsing what is going to be said + casual informal and everyday - interactive one participant - face-to-face on his or her own - language as action using language to reflect - spontaneous planning, drafting and rewriting - casual formal and special occasions Speech is not WritingDifferences in Spoken & Written Texts
Transcribing & Coding iterate until the text is transcribed and coded Transcribe Seek to Lead-in Zone Playback Coding cue the tape (rewind and fast forward) until you get to the part of the tape you are seeking
CHAT • one of the best standards is CHAT- Codes for the Human Analysis of Transcripts • well defined standard • even in research literature, transcriptions are often ad hoc & idiosyncratic • formal standards are difficult to obtain
CHAT • developed for subsequent computer processing in mind • suite of programs is available called CLAN to parse the text • excellent provision for creating transcripts even when the text is difficult to understand • speaker has an accent or has a speech problem
CHAT • standard is extensible; provides a consistent way of adding new headers if necessary • developed by Brain MacWhinney and Jane Walter at the CHILDES- Child Language Data Exchange Research Centre Department of Psychology, Carnegie Mellon University
CHAT Structure • CHAT has a basic structure common to all transcripts • a block of so-called Constant Headersat the top of the transcriptstarting with an @Begin • the body of the transcriptconsisting of turns taken by speakers called Mainlines, followed by zero through to many Dependent Tiers • a single command which is used to signal the end of the transcript, @End
CHAT StructureTop of the Transcript (1) • the top of any transcript always has two compulsory commands: @Begin @Participants: MCL MicroLabs Assistant, STU Student • @Beginindicates the start of the transcript. It must always be the first line of any CHAT transcript. It does not include any other information...
CHAT StructureTop of the Transcript (2) • @Participantsspecifies is a mandatory ConstantHeader- a command only used once per transcript- which lists the interactants in the transcript. The syntax as with all transcripts is critical. • the three letter codes after the header indicate a person who speaks or is other wise involved with the text • the string after the three letter code explains the role of that participant in the text
CHAT StructureTop of the Transcript (3) • below the @Begin and @Participants can be listed other optional constant headers including @Age of, @Sex of, @SES of @Age of MCL: 35 @SES of MCL: middle @Sex of MCL: male
CHAT StructureTop of the Transcript (4) • optional Constant Headersmust follow the @Participants header because they need to refer to the three letter participant identifier • whether you include them will depend on if they are significant: is the age of a participant important in the text? • a complete list follows...
Table 1 : CHAT Constant Headers. CHAT Constant Headers. Constant Headers that have proved to be useful in workplace language studies (Clarke 1996b, 1996c) are presented against a white background while less relevant Constant Headers are presented against a shaded background. @Begin indicates the start of CHAT file @Participants: list of actors in file @Age of XXX: speakers age in yymmdd format @Birth of XXX: date of birth of speaker @SES of XXX: socio-economic status of speaker @Education of XXX: speakers education in years @Sex of XXX: indicates gender of the speaker @Filename: name of transcription data file @Coding: version of CHAT being used @Warning: relative completeness of the transcript @End indicates the end of CHAT file
CHAT StructureTop of the Transcript (6) • the CHAT Constant Headerscan also be represented using a syntax diagram, which are also used for describing the syntax rules for computer languages like Pascal • a diagram follows...
Figure 3 : CHAT Constant Headers Syntax Diagram
CHAT StructureTop of the Transcript (8) • Completed transcript so far... @Begin @Participants: MCL MicroLabs Assistant, STU Student @Age of MCL: 35 @SES of MCL: middle @Sex of MCL: male @Age of STU: 18 @SES of STU: middle @Sex of STU: male
CHAT StructureTranscript Body (1) • most of the transcript body of mainlines which indicate that a participant is taking a turn in the conversation • other features are also found in the transcript body include: • Dependent Tierswhich are used to add special coding for a given turn • Changeable or Repeating Headers
CHAT StructureMainlines (1) • a mainline is a turn taken by a participant, indicated by an * • who takes a turn is indicated by one of the participant identifiers, listed in the @Participants constant header...
CHAT StructureMainlines (2) • the text comprising the speakers turn is transcribed after the * and participant identifier • an example of a completed mainline: *MCL what software do you want
CHAT StructureDependent Tiers (1) • Dependent Tiers are used to add extra detail • many different types of them • always relate only to a specific turn, and if necessary, are only ever listed below the mainline to which they refer
CHAT StructureDependent Tiers (2) • dependent tiers are identified in a transcript by the use of a % followed by the appropriate dependent tier code • the dependent tier code tells the reader what kind of information is being coded for the above mainline
CHAT StructureDependent Tiers (3) • an example showing a mainline and its two dependent tiers (%sit, %com) is provided below: *MCL what software do you want %sit STU and MCL are at the service desk %com STU looks like he is lost • a list of valid dependent tiers follows...
Table 4 : CHAT Dependent Tiers. Dependent Tiers that have proved to be useful in workplace language studies (Clarke 1996b, 1996c) are presented against a white background while less relevant Dependent Tiers are presented against a shaded background. %flo simplified flowing original %pho phonetic and phonemic transcription %par paralinguistic features %int intonation and prosody %lan code shifting into secondary language %act actions %fac facial actions %gpx gestures and proxemics %add addressee %sit situational coding %exp explanation %com comments by investigator/transcriber %alt alternative utterance %tim time stamp coding %spa speech act coding %mor morphemic semantics %phs phrase structure notation %err error coding %cod general purpose coding
CHAT StructureChangeable/Repeating Headers (1) • Repeating Headers can be inserted repeatedly in a transcript, but they are only used when a significant condition has changed • inserted in a transcript, a Repeating Header is valid for the remainder of the transcript, or until another Header of the same type overrides it
CHAT StructureChangeable/Repeating Headers (2) • a list of valid Changeable or Repeating Headers is provided on the next slide • just like the Constant Headers, Changeable or Repeating Headers can be described using a syntax diagram, which is on the slide following the list
CHAT StructureSummary...so far! • so far we have described three separate types of structure that occur within the body of a CHAT transcript: • Mainlines (for transcribing turns) • Dependent Tiers (for coding turns) • Changeable or Repeating Headers
CHAT StructureSpecial Mainline Codes (1) • sometimes it is important to add additional information into the mainline itself • NOTE the following about the body of the CHAT transcript: • an actual turn as shown in lower case on a mainline, and • that there is normally no punctuation on mainlines
CHAT StructureSpecial Mainline Codes (2) • this is because when punctuation is used it conforms to CHAT Special Mainline Codes • Special Mainline Codes occur in one of two types: • Utterance Junctures and Delimiters • Utterance Ambiguity Codes • we will describe both types in order...
CHAT StructureSpecial Mainline Codes (3) • Utterance Junctures and Delimiters- • indicate either junctures or brakes in the turn (pauses etc). These Special Mainline Codes are referred to as Utterance Internal Junctures • indicate how a turn was completed (as a question, the speaker was interrupted etc). These Special Mainline Codes are referred to as Post Utterance Delimiters
CHAT StructureSpecial Mainline Codes (4) • Utterance Junctures and Delimiters continued... • indicate how a turn was started, either by a participant taking up anothers talk (called latching), or by completing anothers talk (called completion). These Special Mainline Codes are referred to as Pre Utterance Delimiters • a list follows...
Utterance Junctures and Delimiters (a) Utterance Internal Junctures [#] Short Pause [#long] Long Pause [#ss.mm] Timed Pause , Comma (b) Post Utterance Delimiters . Period ? Question ! Exclamation [...] Trailing off [\] Interruption (c) Pre Utterance Delimiters [>] Latching [+] Completion
CHAT StructureSpecial Mainline Codes (6) • Utterance Ambiguity Codescan also be inserted into a mainline • used when there has been: • a problem with the transcription process, or • when an unusual condition occurs (when a gesture substitutes for a word) words used special coding is required...
CHAT StructureSpecial Mainline Codes (7) • Utterance Ambiguity Codesmay also be moved to their own dependent tiers if the mainline is getting cluttered up with coding • the table that follows shows the valid CHAT Utterance Ambiguity Codes ...
CHAT StructureBottom of the Transcript (1) • the only unique syntax for the bottom of the transcript is the @Endmandatory Constant Header • needed to indicate when a transcript is finished • a relatively complete transcript extract showing required features follows. NOTE that : is not part of the CHAT standard...
Tool Support(1) • the CHAT system has a number of tools available for it • one tool called CLAN consists of a parser for checking the syntax of CHAT transcripts • multimedia versions of CLAN are being developed; useful when meetings have been videotaped
Tool Support (2)Needed for Transcription NOT Coding • these tools are great for building elaborately coded transcripts • they are not so helpful when dealing with workplace language • coding is not the major problem- its transcription that takes the greatest effort in workplace language studies
Tool Support (3)Transcription • there are of course a number of transcription systems which when combined with CHAT and CLAN could form a useful workplace language system • but, the ‘State-of-the-Art’ still not very good
Tool Support (4)Speech Recognition? • some manufacturers claim to get 95% accuracy in transcription, but this is only possible under very constrained conditions: • these systems cannot handle speech which is continuous and flowing- the software cannot find where words start and end • these systems cannot transcribe speech unless the system has been trained to understand each and every speaker
Tool Support (5) • in some circumstances the inability of current systems to recognise Flowing Speechmay not be a great problem because workplace transcripts can be sparse • Some excellent system are becoming available eg./ Dragon DICTATE for Windows
Tool Support (6) • but, it has taken the IS Discipline 20 years to come up with reasonable CASE tools to support traditional systems development activities • we may need another 20 years to provide the same level of support for semio-informatics!