150 likes | 500 Views
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus. Arthur Kunkle ECE 5526 Fall 2008. CMU Sphinx Developed by Carnegie Mellon University. has been supported by programs such as DARPA, IBM, Sun Microsystems Some notable applications that use Sphinx include:
E N D
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008
CMU Sphinx Developed by Carnegie Mellon University. has been supported by programs such as DARPA, IBM, Sun Microsystems Some notable applications that use Sphinx include: Roomline, a conference room reservation system at CMU Let’s Go, a spoken dialog system in use at Pittsburgh’s transit system. HTK originally developed in 1989 by the Speech Vision and Robotics Group of Cambridge University HTK was purchased by Entropic Laboratories in 1993 and then again by Microsoft during its acquisition of Entropic in 1999. The HTK source code was then licensed back to Cambridge University for advances in development. Open source since then Framework Introduction
Phase 1: Performance-Based Areas of Comparison • Training and Decoding using the AN4 Corpus • Same procedure used in Homework #5 • Provides following metrics • Decoder time to completion • Decoder accuracy on the sentence level. • Decoder accuracy on the word level. • Types and quantities of decoding errors encountered during the decoding process. • Notable trends of errors • Memory requirements for recognizer at runtime
Phase 2: Other Notable Areas of Comparison • Coded data feature format support • Language Modeling support • Overall ease of training and decoding corpora • Notable features of the Software Baseline for each toolkit • Operating System support • Available documentation and community support • Licensing and usage rights • Future Toolkit development plans
Training Procedure Developed • In a “tutorial” format: • HTKTrainingDecoding_tutorial.doc • An example of a full-result developed tutorial directory is also included on the CD • htktut
Training Results Comparison • 8 Gaussians per HMM state • Context-dependant Tri-phone state models • Tied states • Finite State Grammar Language Model
Front-End Data Feature Support • Sphinx provides wave2feat for limited conversion to MFCC (used in a previous homework). However, “Sphinx trainer and decoder are compatible with man other data formats” Need more research into which specifically • HTK Provides HCopyto do many different conversions:
Language Modeling • Both frameworks use N-Gram Statistical Grammar models as well as Fixed, context-free grammars (defined by BNF-type networks). • HTK includes two separate modules HLMLib and HLMTools to provide N-Gram Language Model training, class-based models, and perplexity calculations. • NOTE: HTK Book also includes a thorough tutorial building and training such a model using phrases from Sherlock Holmes • Sphinx relies on other tools for LM Generation. (Reference CMU Statistical Language Model toolkit).
Sphinx Organized across three components Huge amount of Code Uses Unix-style directory organization Source files averaged 1200 LOC Includes automated tests. HTK All in one distribution Organized into HTKLib, HTKTools, and HLMLib, HLMTools Average LOC: 1400 Only one level of dependency between *Tools and *Lib Notable Software Baseline Characteristics
Documentation • HTK has an excellent wealth of information available through the HTKBook. • The first part of the book gives enough background theory to equip relatively unversed individuals with enough knowledge to understand the mechanics of the toolkit. • Section two of the book provides extensive details about the core architecture of HTK through the major phases of model training and testing. • Section three provides an in-depth look into the language modeling features that HTK provides as a part of its framework. • Section four provides a detailed reference to each application that is provided with the framework. • No comparably detailed information exists for Sphinx. (Does have automatically maintained Doxygen and JavaDoc, however).
Licensing (IMPORTANT!) • MAJOR Difference in the restrictions. • HTK – “The Licensed Software either in whole or in part can not be distributed or sub-licensed to any third party in any form.” • Makes the application of HTK a very important question when deciding. • Sphinx Licensed by CMU, may be re-distributed.
Sphinx Last release of a major Sphinx component (Sphinx3) was in 06/2007. PocketSphinx, embedded decoder Sphinx-4, pure Java implementation. HTK Last release of HTK3 in 12/2006 Lack of public announces. Recent Release Activity and Future Plans
Comparison Matrix • Developed to summarize results across many areas of comparison • comparison_matrix.xls
References • Main HTK Website -- http://htk.eng.cam.ac.uk/ • Sourceforge Sphinx -- http://cmusphinx.sourceforge.net/html/cmusphinx.php • Brief Sphinx/HTK Comparison -- http://lima.lti.cs.cmu.edu/moinmoin/SphinxHTK • HTKBook -- http://htk.eng.cam.ac.uk/prot-docs/htk_book.shtml • ASR System Review -- http://www.cis.hut.fi/Opinnot/T-61.6040/pellom-2004/lecture-09.pdf • Arthur Chan Sphinx Presentation -- http://www.cs.cmu.edu/~archan/sphinxPresentation.html • Sphinx-3 Decoder Wiki -- http://cmusphinx.sourceforge.net/sphinx3/doc/s3_description.html#lm_dumpfile