210 likes | 385 Views
Reference population in Batvox - exploring the use of the population optimizer. D.L. van der Vloed D. Meuwly R. Haraksim J.F.M. Vermeulen. Agnitio BatVox [1]. The NFI is in the process of validating BatVox software of Agnitio.
E N D
Reference population in Batvox - exploring the use of the population optimizer D.L. van der Vloed D. Meuwly R. Haraksim J.F.M. Vermeulen
Agnitio BatVox [1] The NFI is in the process of validating BatVox software of Agnitio. BatVox is speaker comparison software; it compares a test recording (trace, disputed sample) with a suspect recording (training file, known sample). To model the within variation the suspect recordings are used. (HSS) To model the between variation a reference population is used. (HDS)
Agnitio BatVox The output has the form of a Likelihood Ratio [2]. LR = P(E | HSS) P(E | HDS) In words: The chance that the distance between test and suspect is observed if it is part of within variation divided by The chance that the distance between test and suspect is observed if it is part of between variation
Agnitio BatVox – reference population How is the reference population used in BatVox? First: it is used to evaluate the denominator of the LR: P(E|HDS) Second: It is used for normalization of the scores.
Agnitio BatVox – normalization A score is calculated representing the distance between the test sample and the suspect sample. This score is not only influenced by speaker features, but also by other features, like channel and language. The score needs to be normalized in order to be able to use it in an LR-calculation.
Agnitio BatVox – normalization The normalization is done using the variation in the reference population. For this to be a sensible move, the non-speaker information like channel and language in the reference population need to be similar; to each other and to the suspect model. If this is the case, the normalization makes sure only speaker information is used in the comparison.
Agnitio BatVox – population optimizer To ensure suitability of the reference population for this task a population optimizer functionality has been built in. It chooses the ones closest to the suspect recording. For this selection a biometric distance measure is used, similar to the distance measures used in the actual comparisons. One can set BatVox to choose a certain number of reference population speakers out of a larger set. In short: Out of a total set of speakers a sub set of speakers is chosen, which is then used as reference population.
BatVox is only a tool in the hands of the user The reference population and the population optimizer are important concepts in the system when obtaining LRs. Therefore, the NFI devised an experiment to understand better how the population optimizer works.
Validation NFI – speech databases Swiss-French Polyphone IPSC database (16 male speakers) [6] GSM data and PSTN data Used to provide ‘test’ and ‘suspect’ recordings: 1 GSM suspect recording per speaker 6 PSTN suspect recordings per speaker 1 GSM test recording per speaker 5 PSTN test recordings per speaker
Validation NFI – speech databases Swiss-French Polyphone database (1995 male speakers) [5] 1 recording per speaker (PSTN, landline) This database was used as reference population in this study.
Conditions Tests were conducted in three population conditions: Letting Batvox choose: • 35 out of 45 (P1) • 35 out of 1995 (P2) • 1400 out of 1995 (P3) The tests have been done in two channel conditions: • GSM • PSTN
Cllr[4] values PSTN: P1: Cllr: 0.24 minimal Cllr: 0.23 P2: Cllr: 0.39 minimal Cllr: 0.27 P3: Cllr: 0.25 minimal Cllr: 0.23 GSM: P1: Cllr: 0.23 minimal Cllr: 0.11 P2: Cllr: 0.15 minimal Cllr: 0.09 P3: Cllr: 0.29 minimal Cllr: 0.15
Observations Condition P2 ‘35 out of 1995’ is yielding lower LRs throughout the data. Condition P1 ‘35 out of 45’ and P3 ‘1400 out of 1995’ are more or less the same within each of the channel conditions. GSM condition yields higher LRs and lower minimal Cllr than PSTN, even though the reference population consists of PSTN recordings.
Observations The ratio of the sizes of subset and total population appears to be more important than the absolute size of the subset. In P2 ‘35 out of 1995’ two factors come into play: • The reference population will be more like the suspect • The reference population will be more homogeneous The first factor will make P(E|HDS) larger, hence lowering the LR. The second factor will make the variation within the reference population lower, thus the population is ‘narrower’. It is harder to fit in a narrower population, hence P(E|HSS) will become smaller. LRs are lower in P2, apparently the first factor is more important.
Conclusion The ratio of the size of the sub set and the total population is the important factor, rather than the absolute number of the used reference population.
References [1] www.agnitio.es [2] Meuwly D(2006). Forensic individualization from biometric data; Science and Justice; 46, 4, 205 – 213. [3] Ramos D (2007), Forensic evaluation of the evidence using automatic speaker recognition systems, Ph.D. thesis, Universidad Autonoma de Madrid, Madrid, Spain. [4] van Leeuwen, D. and Brümmer, N. (2007); An Introduction to Application-Independent Evaluation of Speaker Recognition Systems; Speaker Classification I; 343; 330-353 [5] http://catalog.elra.info/product_info.php?products_id=708 [6] Meuwly D, Alexander A, Drygajlo A, and Botti F(2003). Polyphone-IPSC: A shared speakers database for evaluation of forensic-automatic speaker recognition systems. In Forensic Science International, vol. 136, p. 367, Istanbul, Turkey, Elsevier.