30 likes | 217 Views
XML Output for Sphinx. Motivation: applications may be able to make use of richer information from sphinx including n-best lists, the word lattice, and other features. An xml dtd format will be standard, and easy to parse, express, and modify. Proposed DTD.
E N D
XML Output for Sphinx • Motivation: applications may be able to make use of richer information from sphinx including n-best lists, the word lattice, and other features. An xml dtd format will be standard, and easy to parse, express, and modify.
Proposed DTD • http://www.cs.cmu.edu/~tkharris/usi/utterance-0.1.dtd • Sphinx produces utterances, each utterance is an xml document that conforms to the DTD • An utterance is an n-best list or word-lattice or both • An n-best list is a list of lists of words • Each list and the words may have features • The DTD desperately needs review
Issues • Is the motivation justified? • Computational/Network impact too much? • API’s are needed to parse XML • Need to get requirements/observations from Sphinx customers