70 likes | 202 Views
What Works, What Doesn’t -- And What Needs to Work. Lynette Hirschman Information Technology Center The MITRE Corporation USA. What Works. If we believe Eric Brill, we should just collect and annotate data... Since data collection seems to work better than looking for new algorithms
E N D
What Works,What Doesn’t --And What Needs to Work Lynette HirschmanInformation Technology Center The MITRE Corporation USA
What Works... • If we believe Eric Brill, we should just collect and annotate data... • Since data collection seems to work better than looking for new algorithms • What this really means is that data collection is more cost-effective than funding research • Similarly, we might conclude that waiting for the next chip is more cost-effective than creating faster algorithms So we should all stop doing researchand look for data and wait...
Conversational Interaction: A Case Study • Speech researchers have always said, there’s no data like more data • Many speech problems are, by definition, data-constrained: • Conversational interfaces require real(istic) data on what people will say to machines in the context of a specific application • Such application-specific data tends to be difficult (expensive) to collect • It requires simulation of interaction with a system, or a running system to collect data to build the system… • How do we collect millions of sentences of application specific data?
How to Collect Real Data Cheaply • Lesson from Victor Zue’s MIT Jupiter system: • Put something out there that people want to use: on-line weather information • This can be done by bootstrapping from a primitive system, using the collected data • MIT has been very successful in collecting data from real users; methodology now used by the DARPA Communicator program So to collect data to build a system,we need a system that works well enough for people to use it
Real Systems To Collect Real Data... • Building a usable system requires integration of multiple technologies: • We need ways to interface to real data sources • We need language understanding • We need intelligible generation and synthesis • We need dialogue management • We need ways to apply the techniques to a different problem domain (application portability) because otherwise, we have to do all this again for the next application • So collection of real data raises basic research issues
Error Rate Over Time in ATIS(Air Travel Info System) Understanding easier than transcription Limiting factor: understanding, not word error Sentence Transcription SL Error NL Error Word Error Error rate log scale Time (months)
Conclusion: What Needs to Work • So we can’t just wait for data -- we need to collect it • And to collect data, we need systems that work so that real users will use them; they must be: • Scalable to handle large amounts of data • Robust so they keep working • Fast, so people can stand to use them • Interactive and engaging, so people want to use them • And while we are at it, it would be nice if the systems not only supported data collection, but were able to learn interactively…