1 / 33

Overview of Language Model Classes and Release Progress

This overview discusses the Language Model Classes, including IHD, BNF, ABNF, XML, and JSGF, and provides explanations, examples, and conversion algorithms for each. It also covers the progress of the release, outstanding issues, and future plans.

reinhardt
Download Presentation

Overview of Language Model Classes and Release Progress

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. min Overview of Language Model Classes and Release Progress XML ABNF IHD BNF JSGF BNF Daniel May Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering

  2. Overview • Language Model Classes • LanguageModelIHD: Explanation of IHD->BNF and BNF->IHD conversions. • LanguageModelABNF: Explanation and example of ABNF->BNF converion algorithm. • LanguageModelBNF: Explanation of graph minimization algorithm • LanguageModelXML and LanguageModelJSGF • Network Utilities: isip_network_builder, isip_network_converter • Release Progress • Outstanding Issues • Plan • Deadline

  3. Class: LanguageModelIHD • What is Normalized BNF? • Normalized BNF consists only of the following three rule forms: • 1. (RULE_NAME) →(TERMINAL),(NON_TERMINAL) • 2. (RULE_NAME) →(NON_TERMINAL) • 3. (RULE_NAME) →(EPSILON) • IHD→BNF • Straightforward conversion process • Each IHD arc is converted to a normalized BNF rule • Example: IHD BNF

  4. Class: LanguageModelIHD • BNF → IHD • Straightforward conversion process • Simply the reverse of the IHD→BNF process • Unique nodes identified by unique instances of: • (RULE_NAME)→(TERMINAL) • Concatenation tokens (“,”) correspond to arcs and are weighted • Example BNF IHD

  5. Class: LanguageModelABNF • ABNF → BNF • Complicated! • Accomplished using a recursive algorithm that extracts sets of ‘right symbols’ and ‘left symbols’ and builds a set of normalized BNF rules. • A set of right and left symbols is found when a concatenation, Kleene star (‘*’) or Kleene plus (‘+’) is encountered. • If n left symbols and m right symbols are found, n x m BNF rules are created. • ABNF rules are processed one at a time • We iterate over the tokens in each rule from left to right and look for concatenation, Kleene star, and Kleeneplus tokens. • When one of these tokens is encountered, the recursive methods findLeftSymbols() and findRightSymbols() are called. Each returns a set of symbols.

  6. Class: LanguageModelABNF • Example • We must first construct a set of nodes using unique combinations of • (RULE_NAME)→(TERMINAL) IHD ABNF Nodes:

  7. RS→R0 Current Rule Left Symbols Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: This rule contains no tokens of interest, so we move on to the next rule.

  8. R0->A,*(B|+(C,D)|E),F,RT Current Rule A Left Symbols Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: As we iterate from left to right, we encounter a concatenation token. The findLeftSymbols method returns ‘A’.

  9. R0->A,*(B|+(C,D)|E),F,RT Current Rule A Left Symbols Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: When findRightSymbols is called, we encounter a Kleene star.

  10. R0->A,*(B|+(C,D)|E),F,RT Current Rule A Left Symbols F Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: The findRightSymbols method must be called on the token following the next concatenation at this nesting level.

  11. R0->A,*(B|+(C,D)|E),F,RT Current Rule A Left Symbols F Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: Next, findRightSymbols is called on the token following the Kleene star. In this case, it’s an opening parenthesis.

  12. R0->A,*(B|+(C,D)|E),F,RT Current Rule A Left Symbols F, B Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: For an opening parenthesis, we call findRightSymbols on the token following it.

  13. R0->A,*(B|+(C,D)|E),F,RT Current Rule A Left Symbols F, B, E Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: We also look for alternation tokens, and call findRightSymbols on tokens following the them.

  14. R0->A,*(B|+(C,D)|E),F,RT Current Rule A Left Symbols F, B, E, C Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: The Kleene plus is ignored since it isn’t currently relevant, and findRightSymbols is called on the open parenthesis.

  15. R0->A,*(B|+(C,D)|E),F,RT Current Rule A Left Symbols F, B, E, C Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: Now we can construct a set of BNF rules from the right and left symbols.

  16. R0->A,*(B|+(C,D)|E),F,RT Current Rule Left Symbols Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: The next token of interest is a Kleene star. For these, we want a self loop on all rule segments following.

  17. R0->A,*(B|+(C,D)|E),F,RT Current Rule Left Symbols Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: Since the following token is an open parenthesis, we find all rule segments separated by alternation tokens.

  18. R0->A,*(B|+(C,D)|E),F,RT Current Rule Left Symbols Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: A different set of rules is created for each segment.

  19. R0->A,*(B|+(C,D)|E),F,RT Current Rule Left Symbols Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: findRightSymbols is called on the first token of each segment, and findLeftSymbols is called on the last.

  20. R0->A,*(B|+(C,D)|E),F,RT Current Rule B Left Symbols B Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: findRightSymbols is called on the first token of each segment, and findLeftSymbols is called on the last.

  21. R0->A,*(B|+(C,D)|E),F,RT Current Rule D Left Symbols C Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: findRightSymbols is called on the first token of each segment, and findLeftSymbols is called on the last.

  22. R0->A,*(B|+(C,D)|E),F,RT Current Rule E Left Symbols E Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: findRightSymbols is called on the first token of each segment, and findLeftSymbols is called on the last.

  23. R0->A,*(B|+(C,D)|E),F,RT Current Rule C Left Symbols D Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: The next token of interest is another concatenation. Again, we find a set of right and left symbols and build rules.

  24. R0->A,*(B|+(C,D)|E),F,RT Current Rule E, D, B Left Symbols F Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: The next token of interest is another concatenation. Again, we find a set of right and left symbols and build rules.

  25. R0->A,*(B|+(C,D)|E),F,RT Current Rule F Left Symbols Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: The next token of interest is another concatenation, but this time, the right symbol is a non terminal.

  26. R0->A,*(B|+(C,D)|E),F,RT Current Rule F Left Symbols ε Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: When findRightSymbols is called on a non terminal, findRightSymbols is called on the first token of the rule referenced.

  27. RS→R0 Current Rule Start Left Symbols A Right Symbols • Class: LanguageModelABNF • Example IHD ABNF BNF Rules: BNF start rules are found by calling findRightSymbols on the first token of the ABNF start rules.

  28. Class: LanguageModelABNF • Weights • ABNF does not have a mechanism for defining weights on arcs because ABNF has no knowledge of arcs. Arcs are just implied by the grammar representation. • When converting from IHD to any other format that uses ABNF as an intermediate, weights are included on the open parenthesis tokens preceding non terminal and terminal symbols. • In some cases, the ABNF rules must be restructured to support weights. This will only be the case if the source of the grammar is not ISIP internal. • Testing • The ABNF→BNF algorithm has been thoroughly tested on ABNF grammars derived from XML, but more testing needs to be done on arbitrary ABNF grammars.

  29. Class: LanguageModelBNF • Graph Minimization • Converting from XML introduces redundancy. Although resulting graphs are equivalent to the originals, they’re much larger and nearly impossible to interpret visually. • The minimize method in LanguageModelBNF can be used to remove redundancy once the language model is in BNF representation. • The algorithm iterates over all rule pairs and determines whether or not the rules can be merged into a single rule. • Rules can be merged if the non terminal of both rules reference the same terminal and if the weights on the concatenation tokens are the same. When two rules are merged, the other rules must all be updated. • Example:

  30. Class: LanguageModelBNF • Example: • Testing • Currently, this minimization algorithm has been tested by visually inspecting the original graph and resulting graph and verifying that they are equivalent. • The isip_lm_tester tool will be able to test it more thoroughly once the language model parsing capability is complete.

  31. Class: LanguageModelXML and LanguageModelJSGF • LanguageModelXML • Wesley has completed this class and checked it in. Minor changes are made every once and a while, but overall, the conversions from BNF to XML and XML to ABNF are working fine. • LanguageModelJSGF • This class will be implemented similarly to LanguageModelXML. • The underlying JSGF representation is ABNF. • JSGF parsing algorithms already exist, but currently, the JSGF tokens are converted directly to IHD. • This was supposed to be finished several weeks ago, but issues regarding ABNF to BNF conversion and graph minimization have caused delays.

  32. Other Language Model Related Utilities • isip_network_converter • Changes have been made to incorporate XML, BNF, and ABNF. • A minimize option has been added that invokes the minimization routine when the language model is in BNF representation. • isip_network_builder • The changes to allow network_builder to save in other formats are pending • isip_lm_tester • Won is in the process of adding parsing capability to this tool. Currently, the tool can only generate random transcriptions. • Soon, it will be able to parse transcriptions and verify that they are valid given a particular language model.

  33. Release Progress • Outstanding Issues • LanguageModelJSGF (Daniel) • Diagnose methods and documentation (Daniel, Seungchan, Ted) • isip_lm_tester parsing capability (Won) • isip_transform and isip_transform_builder (Sridhar) • Varmint backlog (Everyone) • Schedule/Deadline • March 10: All code and documentation will be completed, tested, and checked in (code freeze). • After March 10, we will begin running regression and code integrity tests. • March 31: Release Date

More Related