120 likes | 261 Views
NetTalk Project. Speech Generation Using a Neural Network. Michael J Euhardy. The Speech Generation Idea. Input: a specific letter whose sound is to be generated Input: three letters on each side of it for a total of seven letters input
E N D
NetTalk Project Speech Generation Using a Neural Network Michael J Euhardy
The Speech Generation Idea • Input: a specific letter whose sound is to be generated • Input: three letters on each side of it for a total of seven letters input • Output: the sound that should be generated based on the input letter and the surrounding letters
The Strategy • 26 possible letters • 7 input position • Map each letter in each position to a unique input 7*26 = 182 total inputs
The Strategy • 57 possible sounds generated • Map to 57 output labels
The Resulting ANN A fully connected single layer perceptron with 182 inputs and 57 outputs
The Findings • The trained neural network performs very well, and the larger the training set and the longer spent training on it, the better it performs • The training can be an extremely long process if a high rate of classification is desired and the training set is large
Problems • Time • Space
Time • You can’t rush training the network. Even using a dual PIII-733 with 512MB, it still took a really long time to train any data of a significant size. And just converting all of the characters in the data file to the matrices necessary to use as inputs and labels took hours.
Space • 20000 words of data with maybe 7 letters on average. That’s a matrix 140000x239 Double precision in Matlab, that’s a lot of memory
Workarounds • Smaller data set, only 1000 words • Lower standards of training, only train to 80% classification
Next Time • C++ • Matlab is way too slow and way too memory intensive • Start Earlier, it’s a long process • Multi-Layer Perceptron
Conclusion • I give up! • I don’t know how Microsoft’s Narrator does it, but I bet it doesn’t do it this way.