A Unified Langauge Model Architecture for Web-based Speech Recognition Grammars

A Unified Langauge Model Architecture for Web-based Speech Recognition Grammars XML ABNF IHD BNF JSGF BNF Wesley Holland, Daniel May, Julie Baca, Georgios Lazarou, Joseph Picone Center for Advanced Vehicular Systems Mississippi State University

Speech Recognition • Acoustic Model • Maps audio data to words or phonemes • Language Model • Specifies order in which a sequence of words or phonemes is likely to occur • Described using grammar

Grammar Specifications • Backus-Naur Form (BNF) • Augmented BNF (ABNF) • JSpeech Grammar Format (JSGF) • Speech Recognition Grammar Specification (SRGS) • ISIP Hierarchical Digraph (IHD) BNF ABNF JSGF <A>::=aB ::=bB ::=ε <A>::=ab* <A>=a(b)*; XML-SRGS IHD a <item repeat=“0-”> b </item>

Conversion Design • Goals • JSGF ↔ IHD • XML-SRGS ↔ IHD • Determination of equivalence • Grammar minimization • Final Architecture XML ABNF BNF IHD JSGF

JSGF/XML-SRGS → ABNF • JSGF→ABNF • Trivial • Similar in syntax and structure to ABNF • XML-SRGS →ABNF • Harder than JSGF • Different in syntax and structure from ABNF • Requires enumeration of certain repeat attributes JSGF ABNF <A>=ab*; <A>::=ab* XML-SRGS ABNF <item repeat=‘1-2’> a b </item> <S>::=(ab)|(abab) <item repeat=‘2-’> a b </item> <S>::=abab(ab)*

JSGF/XML-SRGS → ABNF • XML-SRGS →ABNF (continued) • Different weighting mechanisms (weight and repeat-prob attributes) a <item repeat=“0-” repeat-prob=“.45”> b </item> <one-of> <item weight=“.4”>c</item> <item weight=“.6”>d</item> </one-of>

ABNF → BNF • Normalized BNF • Consists of rules of the following formats: • (RULE_NAME)::=(TERMINAL),(NON_TERMINAL) • (RULE_NAME)::=(NON_TERMINAL) • (RULE_NAME)::=ε ABNF • Break rule into multiple rules at each top-level alternation. Recurse on each rule. • For each concatenation, Kleene star, or Kleene plus, extract a set of left symbols and a set of right symbols. • For n left symbols and m right symbols, create n x m connecting rules. • ABNF → BNF • Complicated • Accomplished using a recursive algorithm that extracts sets of normalized BNF rules from a set of ABNF rules BNF

BNF ↔ IHD • BNF ↔ IHD • Each arc translates to a normalized BNF • Terminals correspond to nodes; concatenations correspond to arcs BNF IHD

BNF → JSGF/XML-SRGS • BNF →JSGF/XML-SRGS • Rule-by-rule • Trivial XML-SRGS <rule id=“a”> a <ruleref uri=“#b”/> </rule> <rule id=“b”> <one-of> <item> b <ruleref uri=“#b”/> </item> <item> <ruleref special= “NULL”/> </item> </one-of> </rule> BNF JSGF <A>::=aB ::=bB ::=ε <A>=aB; =b|bB;

Software Tools • ISIP Network Converter • Console tool to perform conversions to and from arbitrary grammar formats • ISIP Network Builder • Java-based graphical tool to design • grammars as finite state machines • Can exports grammars to JSGF, • XML-SRGS, ABNF, BNF, and IHD • ISIP Language Model Tester • Console tool for testing of grammars • Can generate valid sentences in a given grammar • Can parse sentences and determine if accepted by a given grammar.

Summary • Future Work • Web-based front-end to speech recognition software • Mobile speech recognition • Public Domain Toolkit • Contains language model conversion tools • Public domain – available for download

A Unified Langauge Model Architecture for Web-based Speech Recognition Grammars

A Unified Langauge Model Architecture for Web-based Speech Recognition Grammars

Presentation Transcript

Speech Recognition

Using Speech Recognition for Speech Therapy

A Recognition Model for Speech Coding

LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION

Histogram-based Quantization for Distributed / Robust Speech Recognition

Unified Messaging Speech Recognition Voice Over IP

Articulatory Feature-Based Speech Recognition

Articulatory Feature-Based Speech Recognition

A Study on Detection Based Automatic Speech Recognition

Non-Native Speech Recognition Using Confusion-Based Acoustic model Integration

Speech recognition grammars as TRINDIKIT resources

LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION

Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition

Landmark-Based Speech Recognition

A New Bigram-PLSA Language Model for Speech Recognition

LINEAR DYNAMIC MODEL FOR CONTINUOUS SPEECH RECOGNITION

A Game Based on Speech Recognition

A Usage-Based Unified Resource Model

Articulatory Feature-Based Speech Recognition

Articulatory Feature-Based Speech Recognition

Landmark-Based Speech Recognition

Speech Recognition