240 likes | 332 Views
Speech recognition grammars as TRINDIKIT resources. David Hjelm 2003-12-12. TRINDIKIT. Framework for building dialogue systems Written in SICStus Prolog Contains predefined modules for input, output, interpretation, etc… Total Information State (TIS) holds information accessible by modules
E N D
Speech recognition grammars as TRINDIKIT resources David Hjelm 2003-12-12
TRINDIKIT • Framework for building dialogue systems • Written in SICStus Prolog • Contains predefined modules for input, output, interpretation, etc… • Total Information State (TIS) holds information accessible by modules • As long as different modules behave similar with respect to TIS they are interchangeable
Nuance • Speech recognition, voice authentication and text-to-speech engines • API:s to create speech-recognition/text-to-speech clients in Java, C++ and C • Clients can read and write audio in several ways: • native sound card • telephony card • IP-telephony • from audio files
Speech recognition basics feature extraction speech Acoustic model (N-gram) acoustic features viterbi search Language model (N-gram or PCFG) phoneme or word lattice word lattice or n-best list of sentences viterbi search or parsing
Nuance SR models • Acoustic models (master packages) • One or several for each language + some multilingual. • Language models • written using Nuance’s Grammar Specification Language (GSL). • PCFG, but SLM:s can actually be used as categories – SLM:s trained from corpus data separately • compiled using a specific master package into a recognition package (acoustic + language model)
Nuance GSL • EBNF variant augmented with • optional probabilities • optional rudimentary slot-filling semantics • a lot of other special stuff like e.g. • SLM inclusion • external grammar references • external rule references • special words for e.g. pauses and telephony touch-tones • Must not be left-recursive
Example Nuance grammars • Without probabilities or semantics a grammar can look like this: .Top [ Cmd Q ] Cmd ( [ stop play pause ] ?it) Q ( is [ (the vcr) it ] [stopped playing paused] ) • Start symbol(s) are preceded by ’.’ • Nonterminals are uppercase • Terminals are lowercase
More example Nuance grammars • Probabilistic grammar: .Top [ Cmd~0.6 Q~0.4 ] Cmd ( [ stop~0.2 play~0.4 pause~0.3 ] ?it~0.3) Q ( is [ (the vcr)~0.3 it~0.7 ] [stopped playing paused] ) • Slot-filling grammar: .Top [ Cmd {<cmd $return>} Q {<q $return>} ] Cmd ( [ stop {return(stop)} play {return(play)} pause {return(pause)}] ?it) Q ( is [ (the vcr) it ] [ stopped {return(stop)} playing {return(play)} paused {return(pause)} ] ) • Of course they can be combined…
Static or dynamic grammar compilation • Nuance’s recognize function takes one argument, which is either of the following: • a start symbol in the current statically compiled recognition package. In this case recognition is performed using the grammar specified. • a GSL expression. In this case the GSL expression is dynamically compiled on the fly. • The GSL expression can not contain recursive rules, but it can point to a precompiled ’grammar object’ which does.
Current TRINDIKIT – Nuance interface • TRINDIKIT modules exist for Nuance speech input and Nuance speech output. • OAA is used for the communication between TRINDIKIT (prolog) and Nuance client (java). • Each OAA agent connects to a facilitator and declares a set of capabilities. Agents can then pose queries to the facilitator which delegates the each query to the appropriate agent(s) and returns an answer to the requesting agent.
Current TRINDIKIT – Nuance interface OAA facilitator IP telephony telephony card TRINDIKIT OAA gateway Nuance java client native sound card ASR server TTS server
Current TRINDIKIT – Nuance interface • Nuance java client • provides (partial) access to Nuance java API via OAA • loads recognition package at startup • performs SR using one of its top level grammars • TRINDIKIT input module • checks name of dummy resource $asr_grammar for name of top level grammar • calls OAA solvable nscPlayAndRecognize(+Grammar,?Result) • Major disadvantages: • Recognition package must be compiled before using system and specified when running java application • Actual ASR grammar is not a part of TRINDIKIT – can not be modified or checked for coverage by modules
Upcoming TRINDIKIT – Nuance interface • Nuance java client • provides (partial) OAA access to Nuance java API • loads empty recognition package at startup • can compile GSL into a Nuance Grammar Object (NGO) via OAA • performs SR using a GSL expression which points at a NGO • TRINDIKIT input module • checks resource $sr_grammar for actual speech recognition grammar • makes sure $sr_grammar is compiled into a NGO at start-up • calls OAA solvable nscPlayAndRecognize(+GSL,?Result) where GSL = ’<file:/path/to/ngo>’
Upcoming TRINDIKIT – Nuance interface OAA facilitator IP telephony telephony card TRINDIKIT OAA gateway Nuance java client native sound card Compilation server ASR server TTS server
Different ways for implementing sr_grammar resource • Keep the GSL expression making up the Nuance grammar as a prolog string or atom • Easy for Nuance input module • Really hard for other modules trying to reason about the SR grammar • Define the EBNF rules as prolog terms • Quite easy for Nuance input module (convert EBNF to GSL) • Enables reasoning about rules and categories by other modules • Hard to find a working EBNF prolog notation.
Different ways for implementing sr_grammar resource • Define grammar as a set of context free grammar rules (Chosen method) • Some computation by Nuance input module (needs to convert (CFG to BNF to GSL) • Enables reasoning about rules and categories by other modules • Enables efficient parsing (if needed) • Easy to find a prolog notation • Portable – same grammar can be ported to many different speech recognizer grammar formats, as long as they are CFG-equivalent.
CFG resource definition • resource relations: • start_symbol(S) where S is a nonterminal • rule(LHS,RHS) where LHS is a nonterminal and RHS is a list of nonterminals/terminals • rules(Rules) where Rules is the set of rules in the resource • resource operations (not yet implemented): • add_rule(rule(LHS,RHS)) • delete_rule(rule(LHS,RHS)) • add_rules(Rules) • delete_rules(Rules)
CFG rule format • Example rules: rule( nonterminal(np), [ nonterminal(det), nonterminal(n) ] ). rule( nonterminal(det), [ terminal(”a”) ] ). rule( nonterminal(n), [ terminal(”car”) ] ). • Convenient when reasoning about rules in grammar but not very convenient when writing grammars… • Solution: • write rules in EBNF-ish notation using operators. • convert EBNF-ish rules to CFG rules.
’blockworld’ - example CFG resource • ebnf2cfg:assert_rules/0 converts EBNF rules to CFG rules and asserts them :- module( blockworld , [rules/1,rule/2,start_symbol/1] ). :- ensure_loaded( ebnf2cfg ). top( np ). np => det, adj* , n, loc? . adj => colour | size. colour => "blue" | "red" | "green". size => "big" | "small". det => "a". n => "sphere" | "cube" | "pyramid". loc => prep , np. prep => "in" | "on" | "under" | "above". :- assert_rules.
Using CFG resource with Nuance input module input:init:- check_condition( $sr_grammar::start_symbol(Start) ), check_condition( $sr_grammar::rules(set(Rules)) ), cfg2gsl(dynamic,Start,Rules,GSL), oaag:solve(nscCurrentMasterPackage(Package), ( oaag:solve(nscGslCompiledToNGO(GSL,Package,Path) -> true; oaag:solve(nscCompileGslToNGO(Gsl,Package,Path) ),!. input:input:- check_condition( $sr_grammar::start_symbol(Start) ), check_condition( $sr_grammar::rules(set(Rules)) ), cfg2gsl(dynamic,Start,Rules,GSL), oaag:solve(nscCurrentMasterPackage(Package), oaag:solve(nscGslCompiledToNGO(GSL,Package,Path), join_atoms([’<file:/’,GSL,’>’],NGOGSL), recognize_score(NGOGSL,String,Score), apply_update( set( input, String ) ), apply_update( score := Score ).
What must be done before CFG resource can be used with Nuance? • Write actual code of input module (some parts are missing) • Implement nscGetMasterPackage(?Pkg) solvable • Make sure that all nonterminals are upper-case and all terminals are lower-case in GSL • Write real CFG resource (use existing Nuance grammar) • testing, testing and testing…
What should be done? • Documentation of java and prolog code • Trindikit manual • Eliminate left-recursion • Convert to Chomsky Normal Form (?) • Parser/generator for testing CFGs inside of prolog • Multilingual nuance input module • batch scripts for running with ease • Asynchronous input algorithm
What can be done? • PCFG resource • if EBNF format is used, how calculate weights when converting to PCFG? (this has been solved in Nuance though – but is it a proper solution) • SLM resource • would probably not store entire model in memory • Nuance semantics + CFG/PCFG • can GoDiS semantics be expressed? • Convert typed unification grammars to CFG resources • DCG with typed features (regulus), SKVATT(?), HPSG • Grammatical Framework CFG approximation • e.g. by limiting sentence length or letting grammar overgenerate • problem: any interesting grammar will overgenerate a lot
What can be done? • Write modules for Java Speech API, ViaVoice, etc. using the same CFG resource… • Use several recognition grammars in sequence (one after the other on the same input) • Dynamically generate recognition grammar based on IS contents and or system expectations • Letting the system learn new words - ”How do you spell that?”