Speech enabling a pps for Windows Phone 8

Speech enabling apps for Windows Phone 8 F Avery Bishop Senior Program Manager, Microsoft 3-050

Agenda New speech functionality for Apollo apps Scenarios for speech in apps Using voice commands to launch your app Adding recognition and synthesis to your app Questions

New for Apollo • Bringing the power of speech to your apps. • Voice commandsLaunch app to a page and execute a command • with one utteranceAPI for in-app dialog • Rich speech recognition API with UI • Text-to-Speech for feedback, notification

Scenarios for speech in apps • Launching an app with a voice command • Driving to work Mandy remembers that she needs to check the memo stored in her Magic Memo app. When she has a free minute she picks up her phone and searches until she finds the app. She pokes around the menu to find the “View Memos” page. When she finally finishes she mutters to herself “I shoulda used a piece of paper” in frustration. • Later Mandy upgrades to the Magic Memo app on Apollo and learns how to enter and view memos by voice. Tonight she presses the Start button and says “Magic Memo, show all memos”. The app starts and puts her right on the memo list page. • To be continued …

Scenarios for speech in apps • Dialog in an • app: speech recognition • and synthesis • Before she upgraded Mandy had to enter new memos in Magic Memo by hand. She certainly couldn’t do it while driving, and even at home she had to stop what she was doing and punch at the keyboard, correcting mistakes and getting frustrated. • With the Windows Phone 8 version of Magic Memo Mandy can enter, view, and select memos by voice. From the main page she taps the mic icon and says “Book table for two at Daniel’s Broiler” The app displays her new memo, and Mandy says “Save”. This whole experience has been so much easier that she exclaims “I LOVE this APP!”

Voice commands overview • Launch app • and execute command in • one utterance • Anywhere on the phone user says app name • and a command: • “Magic Memo, add new memo” • “Magic Memo, show memo number five” • App gets command name, recognition text, and other info in query string. • Includes built-in UI for Feedback, discoverability and disambiguation.

Voice command overview • Launch app • and execute command in • one utterance • Commands contained in • VoiceCommandDefinition (VCD) file: • Uses simple XML format • Allow custom, dynamic parameter lists • Supports commands in multiple languages • App initializes VCD once on first run • Can update parameters dynamically

Steps to enabling voice commands Specify commands in VoiceCommandDefinition file Register the VCD file once (e.g., on first run) Handle command in app when launched Optionally update PhraseLists (parameters) dynamically

Voice command programming Step 1: create VoiceCommandDefinitionfile • <?xmlversion="1.0"encoding="utf-8"?> • <VoiceCommandsxmlns="http://schemas.microsoft.com/voicecommands/1.0"> • <CommandSetxml:lang="en-us" Name="MagicMemoEnu"> • <CommandPrefix>Magic Memo</CommandPrefix> • <CommandName="showMemos"> • <Example>Show memo number 2</Example>  • <ListenFor>Show [me] memo [number] {num} </ListenFor> • <ListenFor>Display memo [number] {num} </ListenFor> • <Feedback>Showing memo {num}</Feedback> • <NavigateTarget="/ViewMemos.xaml"/> • </Command> •  • </CommandSet> • <PhraseListLabel="num"> • <Item>1 </Item> • <Item>2 </Item> • <Item>3 </Item> • </PhraseList> • </CommandSetxml:lang="jp-JA" Name="MagicMemoJp"> • <!– Add CommandSets for other languages --></CommandSet></VoiceCommands> Recognizes: Magic Memo, show me memo number three. Magic Memo, show memo one. Magic Memo, display memo number two. Magic Memo, display memo two.

Voice command programming Step 2: include code to initialize VCD • usingWindows.Phone.Speech.VoiceCommands;// ...// Load the VoiceCommandDefinition file, usually on first run in App.xaml.csprivate asyncvoidApplication_Launching(object sender, LaunchingEventArgs e){try • { • // Path to Voice Command Definition (VCD) file in application install path • Uriuri = newUri("ms-appx:///MagicMemoVCD.xml"); • awaitVoiceCommandService.InstallCommandSetsFromFileAsync(uri); • } • catch (Exception ex) • { • // Handle Exception • } • }

Voice command programming Step 3: include code to handle navigation and execute commands • private voidViewMemosPage_Loaded(object sender, RoutedEventArgse){// Other code omitted ... // Was the page launched by voice commands?if(this.NavigationContext.QueryString.ContainsKey("voiceCommandName")){stringvoiceCommandName = this.NavigationContext.QueryString["voiceCommandName"]; • switch (voiceCommandName) • { • case"showMemos": • stringmemoNumber = this.NavigationContext.QueryString["num"]; • // Display requested memo • break;// cases for other commands • default: • // No match • break; • } • }}

Voice command programming Step 4: (optional) include code to update phrase lists • // Get VoiceCommandSet objectVoiceCommandSetmemosVcs =VoiceCommandService.InstalledCommandSets["MagicMemoEnu"]; • // Update PhraseList for use in commands (any time)await memosVcs.UpdatePhraseListAsync("num", newstring[] {"1", "2", "3", "4", "5"});

Voice commands UI: what can I say

Voice command UI: listening and confirmation

Using voice commands to launch the Magic Memo App • Demo

Overview of API for in-app dialog Flexible, versatile API Built-in and default functionality for non-specialists Advanced features for complex scenarios Speech synthesis (text-to-speech) supports Speak plaintext Speak synthesis markup for richer scenarios Speech recognition supports Predefined and custom grammars UI for feedback, disambiguation • Speech recognition and synthesis in a running app

Speech APIs for speech synthesis* Speaking plaintext in two lines: • async privatevoidButtonTTS_Click(object sender, RoutedEventArgs e) • { • SpeechSynthesizer synth = newSpeechSynthesizer(); • awaitsynth.SpeakTextAsync("You have a meeting with Peter in 15 minutes."); • } *Also called “Text to Speech” (TTS)

Other TTS features Speak SSML* feature Events: SpeechStarted BookmarkedReached API to select installed voice to use *SSML: speech synthesis markup language – w3c standard format for richer scenarios.

Speech recognition features (1/2) • Recognition with and without default UI • Three grammar formats (more later) • Events for: • AudioProblemOccurred • SpeechAudioCaptureStateChanged

Speech recognition features (2/2) • Recognition with and without default UI • Settings for customization of UI: • Example • Listen text • Speech recognition result object with: • Alternates • Confidence • Semantics • Etc…

Speech recognition in three lines: • async privatevoidButtonSR_Click(object sender, RoutedEventArgs e) • { • SpeechRecognizerreco = newSpeechRecognizer(); • // Use the default short message dictation grammarSpeechRecognitionResultrecoResult = await reco.RecognizeAsync(); • // Do something with the recognition result • MessageBox.Show(string.Format("You said {0}.", recoResult.Text)); • }

Introduction to speech grammars • Grammar: rules specifying word combinations allowed in a language. • Speech recognition uses special grammars to improve accuracy. • Accuracy declines, latency increases as grammar size/complexity increases. • Specialized grammars limit the search space (fewer word combinations to search), increase accuracy.

Specifying a speech grammar • Predefined grammars: • uses remote speech service Two predefined grammars for Windows Phone 8: Default: Short message dictation (SMD) WebSearch Easy to specify and use myReco.Grammars.AddGrammarFromPredefinedType ("mySearch",SpeechPredefinedGrammar.WebSearc); Future may include other predefined grammars

Specifying a speech grammar • Custom grammars: recognition is on device • Simple list grammar • App specifies all the phrases to listen for • Useful for many simple scenarios • Grammar in W3C format • Uses W3C Speech Recognition Grammar Specification (SRGS) format • Provide many features and flexibility • Authoring and testing require a time commitment

Speech recognition with UI and custom grammar • asyncprivatevoidInitializeSpeech() • { • commandReco = newSpeechRecognizerUI(); // Instantiate speech recognizer • commandReco.Settings.ListenText = "Say a button name or select number"; // Prompt spoken to user • commandReco.Settings.ExampleText = "Ex. 'Clear three' or 'Select two'"; // Displayed as examples • // SRGS grammar to recognize button names and numbers • UrigrammarFileUri = newUri("ms-appx:///ViewMemos.grxml"); • // Add grammar for later loading • commandReco.Recognizer.Grammars.AddGrammarFromUri("srgsCommands", grammarFileUri); • awaitcommandReco.Recognizer.PreloadGrammarsAsync(); // Preload grammar to reduce latency (optional) • } • asyncprivatevoidMicImage_Tap(object sender, GestureEventArgs e) • { • varcommandResult = awaitcommandReco.RecognizeWithUIAsync(); // Start speech recognition • if (commandResult.ResultStatus == SpeechRecognitionUIStatus.Succeeded) • { // Execute command spoken by user } • }

Speech API classes • Use SpeechRecognizerUIto get built-in UI.Use SpeechRecognizerif you roll your own UI.

Built-in speech recognition UI (1/2) • UI drops down from the top, obscuring app

Built-in speech recognition UI (2/2) • Recognition puts up disambiguation or confirmation screen

In-app dialog using the speech API with default UI • Demo

What you can do • Get the SDK and try it out. • Wow your users with Voice Commands to launch and control your app in one utterance. • Use the TTS and Recognition APIs to increase usability and convenience of your apps. Your users will love it!

Resources November and December MSDN Magazine articles on Speech in Windows Phone Start at: http://msdn.microsoft.com/en-us/magazine/default.aspx

Speech enabling a pps for Windows Phone 8