190 likes | 434 Views
SPEECH APIs. !. Provide access to vendor’s speech synthesis speech recognition command-and-control speech recognition The programmer defines a restricted grammar/vocabulary himself dictation speech recognition
E N D
SPEECH APIs ! Provide access to vendor’s • speech synthesis • speech recognition • command-and-control speech recognition • The programmer defines a restricted grammar/vocabulary himself • dictation speech recognition • The programmer uses the general (statistical) built-in grammar of the recogniser (optimised for a topic/domain)
Available Speech APIs: 1. SAPI • SAPI by Microsoft+vendors (IBM etc.) • cross-vendor api • Platform: Windows 95/98 or Windows NT 4.0 (or later) • Microsoft Visual C++ 4.0 or later • NB: Try it out with MS Whisper (free!)
Available Speech APIs: 2. JSAPI ! • JSAPI by Sun Microsystems+vendors (Apple Computer, Inc, AT&T, Dragon Systems, IBM, Novell. Inc. Philips, Texas Instruments Incorporated) • cross-vendor api • cross platform api • programming via JAVA • NB: Try it out with ViaVoice for Linux (free!)
Available Speech APIs: 3. VOCAPI • VOCAPI by Philips, Bosch, Siemens, Opel, Sony, Volkswagen ...) • cross-vendor small-sized api • cross platform api intended for PDAs, hands-free operation in cars etc. • programming via C
JSAPI & JSGF ! • Java Speech Grammar Format • central for controlling speech recognition in JSAPI • platform-independent, vendor-independent • language-independent (… largely!) • corresponds to/enhances the “CFG format” defined in SAPI • enhancements: Java-style notations (see below)
JSGF Programming Issues ! • loading, creating, deleting of grammars in a speech recognizer, activation of grammars for recognition etc. • Loading grammars via URLs on a web site. • Mechanisms for receiving results of recognition for a grammar and processing of those results. • Vocabulary management including handling of token pronunciations.
JSGF for speech recognition ! • To be used for “rule grammars” (or “command and control grammars” or “regular grammars” • non-statistical*, small vocabulary, low-perplexity, domain/application dependent, for spoken dialogues • Not to be used for “dictation grammars” • statistical, large vocabulary, high-perplexity, domain/application independent (may be optimised for a “topic”), for dictation, presupposes adaptation * (However: see slide about weights)
JSGF notation 1.(4.1. ff.) ! • BNF-equivalent, traditional style: • Non-terminals (“rule names”) enclosed in <> • Terminals (“tokens”, “words”) in Unicode characters • Operators for ‘or’, ‘iteration’, ‘optional’ etc. E.g. <firstname> = John | Peter | Mary+; <firstnames> = (John | Peter | Mary)+;
JSGF notation 2. (3.1 ff.+4.9) ! • JAVA adapted style: • JSGF header: grammar name/import grammar dk.mydomain.emailapplication.mailBrowser import <dk.mydomain.ReusableGrammars.date> or import <dk.mydomain.ReusableGrammars.Danish.*> • documentation comments p. 9+ 22 (4.9) /** - */
JSGF notation 3 (4.1. ff.) ! • JAVA adapted style (cont.): • public rules vs. non-public (“private”) rules • the Rule Name of a public rule is (one of the) start symbol(s) of the grammar, can be activated: public <s> = <np> <vp>; <np>=<det><n>; <n>=man | woman | bird; • public rules can be imported into other grammars
JSGF Weights (4.2.3) ! • Weights enable the representation of probabilistic grammars (e.g. bigrams, trigrams) in JSGF <size> = /10/ small | /2/ medium | /1/ large; equivalent to probabilities <size> = /10/13/ small | /2/13/ medium | /1/13/ large;
JSGF Weights (4.2.3) ! • Example: A bigram implemented in JSGF • One rule per word (including a pseudo-word BOS “beginning of sentence”) • A rule expansion define the successors of the word associated with the rule, e.g. <successors_of_a> = /5/ man < successors_of_ man> | /4/ woman < successors_of_ woman> | …etc;
JSGF Tags (4.5) ! • Enable primitive “parsing” along with recognition: • handling synonymy: <country> = Australia {Oz} | (United States) {USA} | America {USA} | (U S of A) {USA};
JSGF Tags (4.5.1) ! • separating language specific issues (the actual phrase) from “universal meanings” (“hi”): <greeting>= (howdy | good morning) {hi}; <greeting>= (ohayo | ohayogozaimasu) {hi}; <greeting>= (guten tag) {hi}; <greeting>= (bon jour) {hi};
JSGF Recursions (4.7) ! • Left recursion: not allowed • (could be rewritten as iteration in a regular grammar) • Embedded recursion: not allowed • for Chomsky a very serious restriction! • Right recursion: allowed • (can be rewritten as iteration in a regular grammar) • Likely explanation: Speech recognition presupposes regular (finite state) grammars
JSAPI Recognition Results ! Interface javax.speech.recognition.FinalRuleResult Interface javax.speech.recognition.Result • 1-best list/n-best-list, for each item in list: • list of tokens (“words”) • list of tags • name of grammar accepting input • name of public rule accepting input
Hello World (IBMs JSAPI) ! [cf. grammar on next slide] U: My name is Bruce Adams (rule: nameis, tags: Bruce Adams) S: Hello Bruce Adams U: Repeat after me (rule: begin, tags: begin) S: I am listening (activates dictation grammar+stop-rule) U/S: [S repeats/synthesises sentences dictated by U] U: That’s all (rule: stop, tags: stop) S:OK (deactivates dictation grammar) U: Bye (rule: bye, tags: bye)
Hello World (IBMs JSAPI) ! <first> = Bruce {Bruce}|Andrew{Andrew}|Stuart {Stuart}; <last> = Lucas {Lucas}| Hunt {Hunt}| Adams {Adams}; <name> = <first> <last>; public <nameis> = My name is {name} <name>; public <begin> = Repeat after me {begin}; public <stop> = That's all{stop}; public <bye> = Good bye {bye} | So long {bye};
Exercise ! • Try to “review” JSGF • weak points/strong points • to which extent can it be used for “parsing” (retrieving useful semantics) • resolving lexical ambiguity • resolving structural ambiguity