180 likes | 358 Views
Integrating Nuance and Trindikit. David Hjelm 2003-03-20. Nuance. Speech recognition, voice authentication and text-to-speech engines API:s to create speech-recognition and text-to-speech clients in Java, C++ and C. Trindikit. Framework for building dialogue systems
E N D
Integrating Nuance and Trindikit David Hjelm 2003-03-20
Nuance • Speech recognition, voice authentication and text-to-speech engines • API:s to create speech-recognition and text-to-speech clients in Java, C++ and C
Trindikit • Framework for building dialogue systems • Written in SICStus Prolog • Contains predefined modules for input, output, interpretation, etc…
Trindikit text input/output modules • input_simpletext reads input from screen and stores in input variable. • output_simpletext reads output from output variable and prints on screen • To use Nuance speech recognition and speech synthesis instead , input- and output modules must communicate with a Nuance process, since no Nuance SICStus APIs exist.
Solution: OAA • OAA enables communication between Java and SICStus • SICStus and Java processes register as agents to the same OAA facilitator. Each agent declares a set of solvables to facilitator. Solvables are declared using prolog-like syntax. • Agents can pose queries to OAA community by calling solve(Query). Facilitator will try to find an agent which has declared a solvable that matches with Query. In that case the Query is delegated to the Agent which will try to solve it.
OAA Nuance Agents • These OAA agents are provided in the latest distribution of Trindikit: • OAANuanceSpeechChannel – OAA java agent which provides NuanceSpeechChannel (Nuance Java API) functionality to OAA community • oaa_recserver – OAA prolog agent which can control a Nuance recognition server • oaa_vocalizer – OAA prolog agent which can control a Nuance TTS server
Trindikit Java OAA agents • To simplify the writing of new OAA agents a base class for OAA agents, OAAAgent, is used. This is extended by agent implementing classes. • A OAAAgent has of a number of states which it can be in. For each state a set of solvables is defined. If the facilitator delegates a solve(Query) request to the agent, the agent will iterate through the solvables defined for the state the agent currently is in, to find one that unifies with Query. • The code that solves a solve(Query) request is implemented in a wrapper class OAASolver which defines the method solve. Each OAASolver defines a specific solvable. • OAASolvers are added to the agent via the addSolver method which defines the pre-state(s) and post-state(s) of the OAASolver.
OAANuanceSpeechChannel • OAANuanceSpeechChannel is a java OAA agent which extends OAAAgent. • Another implemented agent is OAAVcr (used in the ILT) project, which functions as a software VCR agent which can record TV programs (captured using a TV-card)
OAANuanceSpeechChannel states • NuanceSpeechChannel offers different functionality depending on its configuration. For example, if it uses a telephony-based audio provider, a call has to be answered before recognition can take place. This is mirrored by the four states (represented as int constants) of OAANuanceSpeechChannel which are: 0 - STOPPED There is no speech channel yet 1 - TEL_IDLE A speech channel using a telephony audio provider has been created. Currently not in a call. 2 - TEL_RUNNING A speech channel using a telephony audio provider has been created. Currently in a call. 3 - NATIVE_RUNNING A speech channel using the native audio provider has been created.
OAANuanceSpeechChannel solvables • The solvables of OAANuanceSpeechChannel are:nscCreate(+Package,+Parameters) (creates a new SpeechChannel)pre-state STOPPEDpost-state TEL_IDLE or NATIVE_RUNNING (depending on Parameters)nscClose (closes the SpeechChannel)pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNINGpost-state STOPPEDnscPlayAndRecognize(+Grammar,?RecResult)pre-state TEL_RUNNING or NATIVE_RUNNINGpost-state same as pre-statenscRecognizeFile(+Filename,+Grammar,?RecResult)pre-state TEL_RUNNING or NATIVE_RUNNINGpost-state same as pre-statenscAppendTTS(+Text)pre-state TEL_RUNNING or NATIVE_RUNNINGpost-state same as pre-state
OAANuanceSpeechChannel solvables nscPlay(+Bool)pre-state TEL_RUNNING or NATIVE_RUNNINGpost-state same as pre-state nscStartPlaypre-state TEL_RUNNING or NATIVE_RUNNINGpost-state same as pre-state nscSetParameter(+Name,+Value)pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNINGpost-state same as pre-state nscGetParameter(+Name,?Value)pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNINGpost-state same as pre-state nscGetAllGrammars(?Grammars)pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNINGpost-state same as pre-state
SpeechChannel events • Some NuanceSpeechChannel methods throw events, e.g. when the user starts speaking. • When these events occur OAANuanceSpeechChannel will post a query to the OAA community consisting of an as close as possible transcription of the actual java event + a 'nsc' prefix. • Other agents can declare these as solvables and implement code that handles the events. nscStartOfSpeechEvent(SafeOffsetSecs,ActualOffsetSecs)nscEndOfSpeechEvent(SafeOffsetSecs,ActualOffsetSecs)nscPartialResultEvent(RecResult) nscPlaybackStartedEventnscPlaybackStoppedEvent(Reason,Tones)nscTerminationEvent(Reason)nscCallConnectedEvent --todonscDTMFEvent(Tones) --todonscHungupEvent(Side,Reason) --todo
oaa_recserver • oaa_recserver is a prolog OAA agent which controls a nuance recognition server process. Solvables are: • nrsStart(+Packages,+Params) Starts a recserver process using packages Packages and parameters Params. Format of Packages and Params is described below. nrsStop Stops the recserver process. nrsGetPackages(?Packages) Returns the currently loaded recognition packages. nrsGetState(?State). Returns current state (stopped or running)
oaa_vocalizer • oaa_vocalizer is a prolog OAA agent which controls a nuance vocalizer process. Solvables are: nvocStart(+Params) Starts a vocalizer process. Params is any command line arguments. nvocStop Stops the vocalizer process. nvocGetState(?State) Returns current state (stopped or running)
Integrating it into Trindikit • Trindikit provides a specific OAA resource, oaag, which can be used to make queries to the OAA community. • Input and output modules specific for OAA+Nuance have been written which make use of oaag. • A speech recognition grammar resource type, asr_grammar, keeps track of which speech recognition grammar Nuance should try to load.
input_nuance_basic_oaa • Calls a OAA agent which performs speech recognition. Also communicates with a nuance recserver OAA agent. • Assumes that if a nuance grammar contains top level symbol '.Top' it has been compiled into a recognition package named 'top'. • To perform recognition using package 'top', a trindikit resource of type asr_grammar should be selected in the configuration file. • For all selected resources of type asr_grammar, their corresponding packages will be loaded onto a recserver. The recclients are created at runtime.
output_nuance_basic_oaa • Calls a OAA Agent which performs tts synthesis. Also communicates with a vocalizer OAA agent.
Future work • real ASR-grammars in asr_grammar resources • Trindikit integration with Regulus for converting feature structure grammars to Nuance grammars • Use of dynamic grammar compilation, so that no Nuance grammars have to be written and compiled in advance. • Integrate with asynchronous Trindikit • Intelligent barge-in • etcetera