110 likes | 226 Views
Development of protocols WP4 – T4.2. Torino, March 9 th -10 th 2006. Presentation plan. Planning and partners Definition Test material: what is needed for the evaluation test Evaluation criteria To do : define the protocols for both platforms. Calendar. 2005. 2006. 2007. m18. m24.
E N D
Development of protocolsWP4 – T4.2 Torino, March 9th -10th 2006
Presentation plan • Planning and partners • Definition • Test material: what is needed for the evaluation test • Evaluation criteria • To do : define the protocols for both platforms
Calendar 2005 2006 2007 m18 m24 m29 m27 T4.3&4 Evaluation on the fixed and mobile platforms Nov.06 June06 Sept.06 Dec.05 T4.2 : Development of protocols M4.1 D4.2 Specification of evaluation protocols Functional integration on both platforms completed M3.2 • TRT (leader , 3 m*m) • Loquendo (2), TUC (2) • UGR (1), Loria (1), THAV (1)
Definition • Evaluation protocol • Defines precisely what must be evaluated, in which environment, what criteria are used and how to proceed. • ex: wine tasting protocols “Define the measures that will be applied during experiments in order to assess the performances of the vocal interaction system as well on a quantitative basis or on a more context dependent, qualitative basis.” what how • The performance of the Hiwire recognition systems • The integration quality on the fixed and mobile platforms >>>
Test material (1/2) • Test grammar • One for each platform • Vocabulary • Number of commands • Speech input • Live speakers • Who? (professional pilots, mechanics) • Type of microphone (close-talking / multi-mic array) • Real conditions simulation (added hangar noise through LPs) • Recorded speech • Hiwire database • Sampling rate / quantification • Mixed cockpit noise
Test material (2/2) • Location • A simulation room • PDA • Microphone + PtoT • A cockpit simulator • Graphical interface • Microphone + VAD • Panel • Professional pilots, mechanics, … (both platforms) • Hiwire database (fixed platform) • Scenario • A list of commands. • Definition of the interaction (synthetic voice, vocal feedback)
Evaluation criteria (1/3) • Objective measures • WAC [0-100] % • SAC, sentence accuracy [0-100] % • CAC, command accuracy [0-100] % • Response time # s • Time between the end of speech and the system response • Task completion rate TCR (+timeout) % of completed tasks • Plugged analyzer inside the system
Evaluation criteria (2/3) • Subjective measures • Usability • Learning time* s • Memorisation effort* [1-5] • Easiness of use* [1-5] • Workload • Number of added tasks correctly achieved # • Naturalness of the interaction [1-5] • Acceptance level [1-5] • A form to fill at the end of the test session, subjective scales • Sensors • heart pulsation • EEG • eyes movement
Evaluation criteria (3/3) • Results Analysis • Gathering objective data • Transforming subjective data into a numerical form • Subjective scales • Comparison with WoOz • Comparison with non vocal text input • Statistical features • Average, standard deviation • Classification
Summary: List of the protocol definition features • Fixed platform • Material • Grammar • Thav grammar (provided at the end of April) • Speech input • Colleagues • ~20 non native speakers (bad>good accent) • Location • The THAV cockpit simulator • Multi-speaker noise diffusion system • MM array • A test scenario • Depends on the grammar • Mobile platform • Material • Grammar • Extended version • Panel/ the users • Colleagues • 10 to 20 • Location • An equipped room, noise diffusion • Factory noise hangar noise (ask Airbus…) • Different levels (from clean to ? dB, at the microphone capsule level) • A test scenario • The maintenance of aircrafts
Summary: List of the protocol definition features • Fixed platform • Criteria • Objective measures • SAC (avg and statistics through speakers) • Response time • Subjective measures • … no pilot • Comparison with the hiwire baseline • Results analysis • statistics through speakers • Mobile platform • Criteria • Objective measures • Response time • SAC • TCR • Subjective measures • Easiness to use • Naturalness of interaction • Results analysis • Comparison with text input / pen input system