130 likes | 418 Views
Data collection and experimentation. Why should we talk about data collection? •. It is a central part of most, if not all, aspects of current speech technology The higher grades (A, B; as tested in the home exam assignments and the project) require a measure of data collection.
E N D
Why should we talk about data collection? • • It is a central part of most, if not all, aspects of current speech technology • The higher grades (A, B; as tested in the home exam assignments and the project) require a measure of data collection
What is data collection? • • In speech technology, the gathering of human communicative behaviours that can be used for implementation of e.g. spoken dialogue systems • What do we gather? • Speech • Text • Voices • Gestures • Patterns!
All vs one? • Recognition: we want to have seen all possibilities • Synthesis: we want one, consistent behaviour
Group exercise • Same groups as before • Design one or more data collection(s) that will become the basis for a spoken dialogue system intended to inform users of the television program • Take note of why you make your design choices • We’ll talk about it here in 30 minutes
Application • Remote control • Select programme • Menu options - tree • TV guide • More free speech • But connected to GUI options (e.g. for lists) • Data • Room environment • Age recognition data • Recognize age • Recognize identity of a specific mother • Usage probabilities • Asking people - ratings • Language? Programmes are english, swedish • Read tv guide • But people speak differently (“trean”) • Monitor corpus (updated) • “Beta” version – iterative process (h/h, WoZ, beta) • Demography: adults, elderly, kids? • Keywords • Cloud • Times • Some commands
What is a corpus? • • Wikipedia: • A collection of written or spoken material in machine-readable form, assembled for the purpose of studying linguistic structures, frequencies, etc. Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
Why collect a corpus?• • ”[...] for the purpose of studying linguistic structures, frequencies, etc.” • Sample - cannot analyze all • Training data for duplicating behaviours • Analysis of how humans do things • Generalisability, representativeness • Same results in different corpora • Use constraints, standards, theories to form the corpus • If findings are expected - corroborate theory - we're better off Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
How is a corpus collected? • • Often high formal demands: • Structure • Balance • Audio, visual, audiovisual - choice of modalities • Requires equipment • Silent lab Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
Where are corpora collected? • Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
When are corpora collected? • • Often collected once, then static • But monitor corpora exists • And the web is as always changing things Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
Examples of corpora? • Multimodal corpus work: manual annotation, validation and computer driven analysis. Jens Edlund, 2012-09-01-05
Thank you! Questions?