130 likes | 320 Views
SEL3053: Analyzing Geordie Lecture 6. The TLS / DECTE phonetic transcriptions . We shall be creating data from the Tyneside Linguistic Survey phonetic transcriptions contained in DECTE. This lecture introduces those transcriptions.
E N D
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions We shall be creating data from the Tyneside Linguistic Survey phonetic transcriptions contained in DECTE. This lecture introduces those transcriptions.
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions The nature of phonetic transcription 1.1 Speech Speech is the physical medium by which meaningful linguistic expressions are transmitted among humans: A speaker wishes to communicate a linguistic expression and encodes that expression as a stream of sounds generated by mouth, throat, and nose. A listener receives the sounds in his ears, decodes them to recover the expression, and then extracts the speaker's meaning from it. The sounds of speech are complex vibrations set up in the air around us by the articulatory organs. The complexity of these vibrations can be shown by using an oscilloscope, which gives a visual representation of them. Example of visualization of speech using an oscilloscope: http://www.youtube.com/watch?v=-BRX7T5VPmk
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions The nature of phonetic transcription 1.1 Speech Though we generate and understand speech so effortlessly that we don't have to think about it, the scientific problem of how the brain encodes and decodes linguistic expressions as sound and the engineering problem of designing speech synthesis and understanding machines have long been the subject of intensive research, and are still imperfectly understood. The problem is that the speech signal is both Noisy: characteristics of individuals' speech production capabilities, ambient acoustics and sounds Ambiguous: not enough information in the signal to support reliable decoding Human brains have circuitry that compensates for the noise and resolves the ambiguity, but we don't yet know how that circuitry works.
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions The nature of phonetic transcription 1.2 Transcription Some of the subdisciplines of linguistics such as socolinguistics and dialectology are interested in studying the phonetics of language. In doing so, analysis of the speech signal is necessary but not sufficient on account of its noisy and ambiguous character. Specifically, for detailed study of phonetics, what is linguistically significant in the speech signal has to be identified and abstracted from the noise. At present, the only way to do this with acceptable reliability is to use the human speech processing mechanism: a human has to listen to the speech of interest and represent the linguistically significant aspects in some way.
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions The nature of phonetic transcription 1.2 Transcription There is a standard way of representing speech symbolically. The International Phonetic Association was established in 1886 to promote the study and application of phonetics. An important part of this promotion was to provide an international standard for the representation of linguistically-significant phonetic aspects of speech. This is known as the International Phonetic Alphabet, or IPA. The IPA has been through a series of developments over the years; the latest version dates from 1996.
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions The nature of phonetic transcription 1.2 Transcription The basic idea is to use a set of symbols to represent speech, each of which stands for a linguistically significant phonetic segment; the full IPA symbol set is here. An oscilloscope trace of the spoken word 'phonetician' together with an IPA transcription is given below:
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions 2. The TLS / DECTE transcriptions The TLS project collected its speech samples and made its transcriptions with specific research aims in mind: to identify interesting regularities in phonetic variation among informants in the corpus, and any correlations between such variation and associated social factors.
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions 2. The TLS / DECTE transcriptions To this end they developed a methodology which was radical at the time and remains so today: In contrast to the then-universal and still-dominant theory driven approach, where social and linguistic factors are selected by the analyst on the basis of some combination of an independently-specified theoretical framework, existing case studies, and personal experience of the domain of enquiry, the TLS proposed a fundamentally empirical approach in which salient factors are extracted from the data itself and then serve as the basis for model construction.
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions 2. The TLS / DECTE transcriptions 2.1 The TLS transcription scheme The TLS wanted a very detailed phonetic transcription of its speech material, and IPA wasn't detailed enough. The project therefore invented an extension of the IPA scheme. Here's a sample page from the TLS encoding manual; the full transcription scheme is at the DECTE website.
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions 2. The TLS / DECTE transcriptions 2.1 The TLS transcription scheme There are four columns in the encoding table: the first three give transcription symbols, and the fourth examples of what speech sounds the symbols represent. The first three columns give symbols for increasingly fine-grained transcription: OU: phonological segments PDV: IPA-level phonetic segments State: the TLS's elaboration of the IPA scheme
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions 2. The TLS / DECTE transcriptions 2.1 The TLS transcription scheme The TLS analysis was based on the finest-grained State transcription. But, because a computer was to be used for the analysis, the rather elaborate graphical symbols at the State level had to be encoded for convenient input and processing. That encoding worked as follows: Each PDV segment was assigned a unique 4-digit code A fifth digit was added to PDV codes to give a 5-digit State code. For example:
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions 2. The TLS / DECTE transcriptions 2.1 The TLS transcription scheme The PDV code foris 0002. State codes for this PDV are created by adding digits 1, 2, 3... so that:
SEL3053: Analyzing GeordieLecture 6. The TLS / DECTE phonetic transcriptions 2. The TLS / DECTE transcriptions 2.1 The TLS transcription scheme Using this encoding, a TLS State transcription file was just a sequence of 5-digit codes. Thus, for the following section of speech / orthographic transcription: Speech: Sound clip