380 likes | 682 Views
Theory and Practice. Nothing so challenging as a practical problem Nothing so practical as a good theory. Two Themes for Talk. Speech Perception as Pattern Recognition Perception is Multimodal People as Optimal Parallel Processors Relationship between theory and practice
E N D
Theory and Practice • Nothing so challenging as a practical problem • Nothing so practical as a good theory
Two Themes for Talk • Speech Perception as Pattern Recognition • Perception is Multimodal • People as Optimal Parallel Processors • Relationship between theory and practice • Apply theory & technology before it is perfected • Valuable findings in application and evaluation • No shortage of new challenges
Baldi and Language Tutoring • For Hearing-Impaired • Second Language Learning • For Reading Disabled • First Language Learning
Two Principles of Perception • Multimodal Synergy • more sources, better performance • Optimal Parallel Processing • best use of the sources
Two Principles of Learning • Time on Task • more time, better performance • Massed versus Spaced Practice • spaced better than massed
Language Training • Speech Production Deficits • Ear instructs the tongue • Can eye instruct the tongue? • Advantages of Talking Head
Baldi as Language Tutor: Advantages • Computers are popular with kids • One agent for each Student • Perpetual Agent • Extreme Patience • No Intimidation • Can Highlight Critical Organs • Can Hide Noncritical Components • Can Reveal Normally-Hidden Parts
CSLU Toolkit Is: • Authoring tools for building and using interactive language systems • Research tools for developing core language technologies and studying human communication • Learning tools supporting for all areas of human language technology (25 tutorials) • Available free ofcharge from CSLU Web site for research and education
CSLU Speech ToolkitFree for Education Uses http://cslu.cse.ogi.edu/toolkit http://mambo.ucsc.edu/psl/tools
Toolkit Computer Requirements Minimum Requirements • Intel Pentium machine • 200 MHz processor • 64 MB Ram (128 preferred) • ~75 MB free disk space • Windows 98 or NT 4.0* • Microphone / Speakers(sound-blaster compat.) *
Main Toolkit Components A programming environment integrating: • Speech Recognition • Text-to-Speech Synthesis • Facial Animation • Natural Language Understanding • Speech Analysis & corpus development • Rapid Application Development
Dialogue Modeling • Rapid Application Developer (RAD) • Graphical drag-and-drop interface • Dialogues are constructed by connecting states into flowcharts • A scripting language (Tcl/Tk) provides flexibility
Speech Recognition • Word Spotting • vocabulary and speaker independent keyword spotter • Alpha-digit and digit recognizers • Rejection of OOV words using garbage model • Optional grammars
Applying the Technology • Limitations of Current Technology • Speech Recognition by Machine • Collecting kid’s auditory/visual data base • Retrain recognizer
Applying the Technology • Limitations of Current Technology • Speech Synthesis • Sounds Robotic • Teachers asked for natural voice
Implications of Research Findings • Research Findings: Ease and Efficiency of Multimodal Perception • Research Hypothesis: Production Training should be Multimodal
Important Questions forLanguage Training • Can the eye instruct the tongue? • Can the eye/ear in combination do better? • Is valid assessment possible?
Second Language Learning • Urgent Need • Shortage of Instruction • Value of Visible Speech • Individually Guided Instruction • Potential for Interactive Dialog
Learning to Read • Importance of Spoken Language for reading • Flexibility of Synthetic Speech • Written Language Presentation with Spoken Language • Segmenting and Highlighting Written Patterns • Electronic Textbooks and Books
Learning a First Language • feedback in the crib • virtual caregiver • storybook reader • parent simulator • perpetual companion
Paralinguistic Synthesis • Nonspeech Segments • Breadth Noise, Cough, Clear Throat, Laugh, Lip Smack, Sneeze, Tongue Click, Burp
Paralinguistic Synthesis • Suprasegmental Visual Information • Eye and head movements for reference • Eyebrow movements with pitch • Eye widening with pitch • Eye blinks at word onsets
Happy Angry Surprise Fear Sad Disgust
Gui for Text Markup Language • Prosody • Emotion • Gesture