320 likes | 423 Views
SpeakEasy. The Computer Speaks. SpeakEasy. Welcome to the Epcot Center. First -- some background for you. SpeakEasy. What is it? . SpeakEasy is the name of the computer program that enables the computer to speak with the inflection and timing we expect to hear in a human speaker.
E N D
SpeakEasy The Computer Speaks
Welcome to the Epcot Center First -- some background for you.
SpeakEasy What is it? • SpeakEasy is the name of the computer program that enables the computer to speak with the inflection and timing we expect to hear in a human speaker. • Why do computers have to sound lifeless and dull? Answer: They don’t!
SpeakEasy was developed by John Goldsmith, and is the joint product of The University of Chicago’s Department of Linguistics and Microsoft Research.
The University of Chicago One of the leading research institutes in the world A private university established by John D. Rockefeller in 1892 Microsoft Research in Redmond WA The research arm of the Microsoft Corporation Research and development with applications of linguistics to real world problems THE UNIVERSITY OF CHICAGO andMicrosoft Corporation
NLPWin A robust parser of written English... Whistler A synthetic voice which the computer can use to speak SpeakEasy was designed to mesh with two of Microsoft’s language projects...
To make the computer’s voice vivid and life-like, what we need to give it is: Prosody.
Prosody: • Intonation (what many people call “inflection”) , and • Timing and pausing.
Speech without a prosody system?! • This is what a computer sounds like without -- and with -- prosody:
Let’s hear that again ... SpeakEasy First, with prosody and then without prosody
Compare a different computer voice, using only a rudimentary prosodic system: Click! Click here for SpeakEasy’s rendition: Click!
What really happens to make a sentence come to life? First, we enter the sentence. Then it goes to the parser, NLPWin. NLPWin analyzes the sentence and sends the analysis back to SpeakEasy. SpeakEasy designs the prosody... And sends all of that to the Backend for synthesis.
Full specification of the utterance NLPWin sends a grammatical analysis Whistler backend synthesizer NLPWin SpeakEasy Welcome to the Epcot Center
Let’s look at that again. Here’s what happens when we want the computer to speak a sentence out loud. Suppose it’s this: “This sentence has been pronounced for you by Speakeasy.”
“This” is a determiner. “sentence” is a noun. “has” is an auxiliary verb. ….“by” is a preposition. “SpeakEasy” is a noun. SpeakEasy computes the intonation… NLPWin Parser and Whistler provides the voice. “This sentence has been pronounced for you by Speakeasy.”
Can a computer have a funny bone? Read to you by SpeakEasy
Would you like to learn more about the ideas that went into the design of SpeakEasy?
Here’s some of what the computer sees: Here is the sentence Here are the tones used And here is the pitch! Click here
Prosody is computed in two steps: • First, we establish the right tones for the sentence; • Then we translate that into pitches that the synthesizer can understand. First, the tones:
We can go see the Epcot Center today. Now the pitches:
Now, maybe that’s not exactly what we meant to say. SpeakEasy is not, unfortunately, a mind-reader. Maybe you meant to say this:
Do you hear the difference? Here they are again.
Whistler provides for the user (a human user, or another program) to control which intonation should be used in cases like that.
Questions are very tough for the computer to get right. So much depends on exactly what it is that you mean to ask -- and how you mean to put it.
Yes/no questions normally rise at the end: But who-what-questions don’t ….
Questions based on the wh-word (who, what, where, when, how, why) don’t rise at the end -- did you ever notice that? Where do you want to go today?
Here’s something you probably never thought of. When you use a noun for the second time in a sentence, you usually say it without its normal degree of stress. If we don’t teach the computer to do that too, we get a funny sentence. Listen: It was the best of times, it was the worst of times. That’s not right! Here’s how it should be said: It was the best of times, it was the worst of times.
What else can SpeakEasy do?
SpeakEasy helped WBEZ, the National Public Radio Station in Chicago, with its fund-raising this spring. • Don’t forget to call this number: 1 888 YOUR NPR
SpeakEasy can read the Berenstain Bears…. SpeakEasy could read a story to a child -- or provide the voice for an interactive computer game.
SpeakEasy Well, there you have it. Thanks for stopping by, and thanks for listening. NLP and Whistler go together well. Don’t be surprised if you hear from me again before too long… Tal vez en español, ou français, oder Deutsch. Sayo:nara -- or, as Americans say, Sayonara!