400 likes | 509 Views
Identification of voices in disguised speech. Jessica Clark* & Paul Foulkes** * University of York ** University of York & JP French Associates pf11@york.ac.uk IAFPA, G öteborg 2006. 0.1 outline. experiment to test ability of lay listeners to identify disguised familiar voices
E N D
Identification of voices in disguised speech Jessica Clark* & Paul Foulkes** * University of York ** University of York & JP French Associates pf11@york.ac.uk IAFPA, Göteborg 2006
0.1 outline • experiment to test ability of lay listeners to identify disguised familiar voices • voices have been disguised artificially, as with commercially available voice changers • pitch modified
0.2 structure • introduction • rationale for experiment • experimental design • speakers • listeners • Control condition • Experimental conditions • results • discussion & conclusion
1. Introduction • technical speaker identification is the most frequent task for the forensic phonetician • lay identification is also common in legal cases • many previous studies have thus examined lay listeners’ ability to identify voices and the factors which affect their ability
1.1 previous studies • identification is not automatic or flawless • listeners can make errors even with highly familiar voices • Ladefoged did not recognise his mother from a short sample (Ladefoged & Ladefoged 1980) • flatmates scored only 68% with 10 second samples (Foulkes & Barron 2000)
1.1 previous studies • identification may be affected by [Bull & Clifford 1984] • type of exposure (active/passive) • length of sample • nature of sample (phone, direct, shouting etc) • delay between exposure and test • age of listener • hearing ability • sightedness • natural variability across individual listeners • specific features of voice • degree of familiarity • nature and extent of any disguise
1.2 degree of familiarity • all things equal, more familiar voices are easier to identify • e.g. Hollien, Majewski & Doherty (1982) • listening tests with 10 male voices
1.3 disguise • all things equal, disguised voices are harder to identify • e.g. Hollien, Majewski & Doherty (1982) • various forms of disguise used
1.3 disguise • previous studies have examined various types of disguise • whisper, pencils between teeth, hypernasality, dialect change, rate change, professional mimics • but little if any work on voice changers • hardware based • software based • easily available
www.crimebusters911.com www.blazeaudio.com www.maplin.co.uk
1.3 disguise • in our study we chose not to use real voice changers, in favour of total control over effects • pitch shift chosen as a universal function
2.1 design outline • simple design • listeners asked to identify samples of familiar voices • Control condition unmodified stimuli • 4 Experimental conditions modified stimuli
2.1 design outline • degree of familiarity known to affect rate of successful identification • thus we trained listeners to identify a group of speakers • controls degree of familiarity • all listeners had exactly the same exposure in terms of length & quality of samples • identification task carried out under same conditions
2.2 speakers • 4 male speakers • 16-18 years old • taken from IViE corpus (Grabe, Post & Nolan 2001) • Leeds dialect (nearest to York) • reading text of Cinderella story
2.2 speakers • training materials created for each speaker • c. 90 seconds of Cinderella (302 words) • edited out disfluencies, non-speech sounds, long pauses • samples normalised for amplitude with Audacity 1.2.5
2.3 listeners • 36 listeners • variety of regional/social backgrounds • York residents • age range 19-55 • 10 male, 26 female
2.4 Control condition • all 36 listeners • 4 voices * 90 seconds = c. 6 minutes • presented by PowerPoint with speakers’ names • Toshiba laptop • Aiwa A170 headphones • individually in quiet room 1. training phase 2. break 3. listening test
2.4 Control condition • all 36 listeners 1. training phase 2. break 3. listening test • 10 minutes
2.4 Control condition • all 36 listeners 1. training phase 2. break 3. listening test • 8 stimuli (2 per speaker) • duration c. 10 seconds • 5 second gap between • extracts from other parts of Cinderella story • normalised for amplitude with Audacity 1.2.5 • answer sheet with names
2.5 Experimental conditions • 4 Experimental conditions • listening tests same format as Control condition • but stimuli modified for pitch • Sound Forge 8.0 • pitch shift effect • accuracy setting ‘high’ • speech 1 mode • preserved durations
2.5 Experimental conditions (i) +8 semitones (ii) +4 semitones (iii) -4 semitones (iv) -8 semitones pitch shift > 8 semitones unnatural and partly incomprehensible
2.5 Experimental conditions • listening test 16-92 days after Control test • no clear effects for length of delay • same training as in Control condition • 10 minute break • 2 stimuli for familiarisation • 8 experimental stimuli per condition • consecutive runs for + and - stimuli • order reversed for half of each group, but no effect
3.1 Control condition • average correct identification = 4.8/8 (60%)
3.1 Control condition • individuals’ range 8 to 0 • 29/36 performed better than chance
3.2 Experimental conditions ** ** ** ** • ** sig. lower than in Control (p < .005, Wilcoxon) • trend (n.s.) for higher scores in + conditions
variability in listener performance, esp. ±4 majority perform above chance except -8
3.3 variation by listener sex ** • women sig. better in Control (p = .008, Mann-Whitney) • trend (n.s.) maintained in Experimental tests • same pattern reported by Bull & Clifford (1984)
3.4 summary • as predicted, identification rates were lower with disguised voices • lowest scores with most extreme form of disguise (±8 semitones) • identification rates slightly better when pitch shifted up than down • trend for women to perform better than men • variability across listeners
4. discussion & conclusion • tests reported here were not forensically realistic • results may be affected by e.g. • degree of familiarity with voice • content of sample (vocabulary, syntax etc) • conditions of exposure (stress etc) • specific form of artificial disguise • software, hardware system • combination of effects
4. discussion & conclusion • considerable variation in listeners’ scores • courts should not assume all witnesses are equally good at such tasks • supports broader principle that lay witnesses should be tested in their ability to identify a voice
4. discussion & conclusion • but even marked disguise was not catastrophic for listeners • a broadly positive conclusion for lay speaker identification • a reasonable chance of identifying familiar voices
4. discussion & conclusion • but a less positive conclusion respect to use of voice changers as a means of protecting vulnerable witnesses giving evidence • more extreme forms of modification may affect intelligibility & naturalness • less extreme forms of modification may render witness’s voice recognisable • different modifications for different voices?
4. discussion & conclusion • as ever… • more work is needed
thankstack thanks to Peter French, Phil Harrison, Robin How
References Bull, R. & Clifford, B. (1984) Earwitness voice recognition accuracy. In G. Wells & E. Loftus (eds.) Eyewitness Testimony: Psychological Perspectives. Cambridge: CUP. pp. 92-123. Foulkes, P. & Barron, A. (2000) Telephone speaker recognition amongst members of a close social network. Forensic Linguistics 7: 181-198. Grabe, E., Post, B. & Nolan, F. (2001) English intonation in the British Isles: the IViE corpus. Final report to UK ESRC R000 237145. www.phon.ox.ac.uk/IViE Hollien, H., Majewski, W. & Doherty, E. (1982) Perceptual identification of voices under normal, stress and disguise speaking conditions. Journal of Phonetics 10: 139-148. Ladefoged, P. & Ladefoged, J. (1980) The ability of listeners to identify voices. UCLA Working Papers in Phonetics 49: 43-51.