On TFSR (semi)automatic systems supportability : novel instruments for analysis and compensation

On TFSR (semi)automatic systems supportability: novel instruments for analysis and compensation Francesco Borchi, Monica Carfagni, Matteo Nunziati

Outline • Main goal • TFSR Systems • LogR estimation • Common test procedures for TFSR systems • System behaviour classification • Supportability evaluation tools • Score compensation tools • Quality assessment logics • Conclusion

Main goal Our goal is to propose a general purpose set of tools for system compensation and quality assessment Specific goals: • to build a generic framework for system analysis • to develop a novel generic tool for system compensation • to assess system quality level on the basis of the amount of compensation required by the system itself

TFSR Systems Voice sample 1 TFSR system LogR Voice sample 2 We define a TFSR system as a black box which receives two or more recordings as inputs and produces one or more scores (LogR) as outputs

LogR estimation 1/2 LogR = log10[P(E | H0) / P( E | H1)] Log-likelihood ratio defines the most supportable hypotesis Hypotesis 1: the two samples belong to different speakers Hypotesis 0: the two samples belong to the same speaker If LogR>0 support goes to the H0 hypotesis If LogR<0 support goes to the H1 hypotesis If LogR=0 no support is provided

LogR estimation 2/2 The real LogR value is unknown. We can estimate it using some approximations. Our systems are error-prone. The system goodness depends ona number of factors: • The way we have used to retrieve voice samples • The kind of parameters employed in the recognition • The algorithms used for parameter extraction • The mathematic model used to estimate LogR Experimentation is the best way to assess system behaviour

Speaker1 … SpeakerN … Common test procedures for TFSR systems1/2 The system is tested against a set of recordings having known origin: 2 or more recordings …

Same speaker pairs (SS) Different speaker pairs (DS) Common test procedures for TFSR systems2/2 Recordings are mixed up and grouped in pairs: SS: test system behaviour when H0 is true. Is LogR>0? DS: test system behaviour when H1 is true. Is LogR<0?

% SS False negatives % DS H1 H0 False positives System behaviour classification1/3 Tippett Plot: a common method to show system behaviour

Only false scores Wrong support System behaviour classification2/3 Provide a solution to eliminate “false score only” areas (red boxes)

System behaviour classification3/3 isoperforming Provide a solution to reduce the amount of false scores ipoperforming

Supportability evaluation tools 1/3 A quantitative evaluation of false scores has been proposed byP. Rose et Al. (2003): LRtest=P(LogR>0 | H0) / P(LogR>0 | H1) Percentage of true positives Percentage of false positives • Interpretable via Evett Table • No information is provided about false negatives • No information about the distribution of false scores • Do they affect a narrow range of scores? Do they widely perturb the system response?

Supportability evaluation tools 2/3 We propose to generalize the LRtest index using a new tool: the “Supportability of System” function (SoS): We know how much we can rely on our system, time by time! SoS(x)=P(LogR>x|H0)/P(LogR>x|H1) if x>0 SoS(x)=[1- P(LogR>x|H1)]/[1-P(LogR>x|H0)] if x<0 Interpretable via Evett Table Defined for both false positives and negatives Univocally detects the amount of false scores for each LogR Provides the accuracyof each score

20% false SoS=90/20=4.5 LogR = -13 90% true Supportability evaluation tools 3/3

original 0 X translated DX Score compensation tools 1/3 Preliminary operation: Eliminate “false score only” areas encreasing or reducing all scores

Score compensation tools 2/3 New LogR = LogR*tanh( Log10(SoS) ) LogR=4 LogR=3 LogR=2 LogR=1

compressed original Reduced amount of false scores Decreased values for true scores Score compensation tools 3/3 Compress all scores by a value defined by the SoS function Reduce the amount of false scores at the cost of a lower discriminative power

Quality assessment logics 1/3 • Score compensation reduces system’s discriminative power • Score compensation is required to prevent unbalanced responses • Compensation increases for decreasing values of SoS • Compensation is intrinsic to the system • A good system must have a strong SoS for each LogR value

Quality assessment logics 2/3 DMTI procedure • Step 1: test the system against a dataset (LogR) • Step 2: calculate supportability (SoS) • Step 3:calculate compensated scores (New LogR) • Step 4: calculate the percentage P of new LogR which has a “strong” SoS score (fixed by our standards) • Step 5: evaluate the Degree of Supportability (DoS): DoS= atanh(2P-1)

Quality assessmentlogics 3/3 Regardless of the specific procedure, our DoS score is equivalent to a LogR score!

Conclusion • A general purpose tool has been developed to score system supportability • An additional mathematic tool has been developed to compensate unbalanced systems • The toolsare system independent and theoretically motivated rather than empirically built • The tools are useful to reduce both false positives and false negatives • False score reduction produces a decrement in discriminative power • Such decrement is intrinsic to the system response and is univocally usable for system quality assessment • The proposed procedure for system quality assessment (degree of supportability) uses the well known Evett scale to score the system supportability

Thank You for your attention…Questions?

On TFSR (semi)automatic systems supportability : novel instruments for analysis and compensation