1 / 18

Natural Language Understanding

Natural Language Understanding. Raivydas Simenas. Overwiev. History Speech Recognition Natural Language Understanding statistical methods to resolve ambiguities Current situation. History. Roots in teaching the deaf to speak using “visible speech”

Download Presentation

Natural Language Understanding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Understanding Raivydas Simenas

  2. Overwiev • History • Speech Recognition • Natural Language Understanding • statistical methods to resolve ambiguities • Current situation

  3. History • Roots in teaching the deaf to speak using “visible speech” • 1874: Alexander Bell’s invention of harmonic telegraph • Different frequency harmonics from an electrical signal could be separated • Could sent multiple messages over the same wire at the same time • 1940’s: separating the speech signal into different frequency components using the spectrogram • 1950’s: the beginning of computer use for automatic speech recognition

  4. The Nature of Speech • Phoneme – a basic sound, e.g. a vowel • The complexity of human vocal apparatus: about 18 phonemes per second • Speech viewed as a sound wave • Identifying sounds: analyzing the sound wave into its frequency components

  5. The Spectrogram I • A visual representation of speech which contains all the salient information • Plots the amount of energy at different frequencies against time • Discontinuous speech (making a pause after each word) – easier to recognize on the spectrogram

  6. The Spectrogram II • The same word uttered twice (especially by different speakers – speaker independence) might look radically different on a spectrogram • The need to recognize invariant features in a spectrogram • Formants: resonant frequencies sustained for a short time period in pronouncing a vowel • Normalization: distinguishing between relevant and irrelevant information • Nonlinear time compression: taking care of the changing speed of a speech • Matching a spoken word to a template

  7. Robust Speech Recognition • Need to maintain accuracy when the quality of the input speech is degraded or when the speech characteristics differ due to change in environment or speakers • Dynamic parameter adaptation: either alter the input signal or the internally stored representations • Optimal parameter estimation: based on a statistical model characterizing the differences between training and test sets • Empirical feature comparison: based on comparison between high-quality speech and the same speech recorded under degraded conditions

  8. Stochastic Methods in Speech Recognition • Generating the sequence of word hypotheses for an acoustic signal is most often done using statistics • The process: • A sequence of acoustic signals is represented using a collection of vectors • Such collections are used to build acoustic word models, which consist of probabilities of certain sequences of vectors representing a word • Acoustic word models utilize Markov chains

  9. Representing Sentences • Syntactic form: indicates the way the words are related to each other in a sentence • Logical form: identifying the semantic relationships between words based solely on the knowledge of the language (independently of the situation) • Final meaning representation: mapping the information from the syntactic and logical form into knowledge representation • System uses knowledge representation to represent and reason about its application domain

  10. Parsing a Sentence • Parsing – determining the structure of the sentence according to the grammar • Tree representation of a sentence • Transition network grammars • Start with initial node • Can traverse an arc only if it is labeled with an appropriate category

  11. Stochastic Methods for Ambiguity Resolution I • Some sentences can be parsed many different ways, e.g. time flies like an arrow • The most popular method for this is based on statistics • Some facts from probability theory • The concept of the random variable, e.g. the lexical category of “flies” • Probability function • assigns probability to every possible value of the random variable, e.g. 0.3 for “flies” being a noun, 0.7 for its being a verb • conditional probability functions (Pr(A|B)), e.g. the probability for the occurrence of a verb given the fact that a noun already occurred

  12. Stochastic Methods for Ambiguity Resolution II • Probabilities are used to predict future events given some data about the past • Maximum likelihood estimator (MLE) • Probability of X happening in the future = number of cases of X happening in the past/total number of events in the past • Works well only if X occurred often, not very useful for low-frequency events • Expected likelihood estimator (ELE) • Probability of X happening in the future = f(number of cases of X happening in the past)/Sum(f(number of cases of some event happening in the past)), e.g. if f(Pr(X))=Pr(X)+0.5 and we know that Pr(X)=0.4 and Pr(Y)=0.6, then ELE(X)=(0.4+0.5)/(0.4+0.5+0.6+0.5)=0.45 • MLE is a special case of ELE, i.e. for MLE f(Pr(X)=Pr(X) • Given a large amount of text, one can use MLE or ELE to determine the lexical category of an ambiguous word, e.g. the word flies

  13. Stochastic Methods for Ambiguity Resolution III • Always choosing the interpretation that occurs most frequently in the training set on average obtains 90% success rate (not good) • Some of the local context should be used to determine the lexical category of a word • Ideally, for a sequence of words w1,w2,…,wn we want a lexical category sequence c1,c2,…,cn which maximizes the probability of right interpretation • In practice, approximations of such probabilities are made

  14. Stochastic Methods for Ambiguity Resolution IV • n-gram models • Look at the probability of a lexical category Ci which follows the sequence of lexical categories Ci-1,Ci-2,…,Ci-n+1 • Probability of c1,c2,…,ck occurring is approximately the product of n-gram probabilities for each word, e.g. the probability of a sequence ART, N, V is 0.71*1*0.43=.3053 • In practice, bigram or trigram models are used most often • The models capturing the concept are called Hidden Markov Models

  15. Stochastic Methods for Ambiguity Resolution V • In order to determine the most likely interpretation of a given sequence of n words, we want to maximize the value of • The Viterbi algorithm • Given k lexical categories, the total number of possibilities to consider for a sequence of n words is kn • The Viterbi algorithm reduces this number to const*n*k2

  16. Logical Form • Although interpreting sentence often requires the knowledge of the context, some interpretation can be done independently of it • basic semantic properties of a word, its different senses etc. • Ontology • each word has 1 or more senses in which it can be used, e.g. go has about 40 senses • the different senses of all the words of a natural language are organized into classes of objects, such as events, actions etc. • the set of such classes is called an ontology • Logical form of an utterance can be viewed as a function that maps current discourse situation into a new one resulting from the occurrence of the utterance

  17. Current Situation • Inexpensive software for speech recognition • The issues: large vocabulary, continuous speech and speaker independence • Automated speech recognition for restricted domains • The speed of serial processes in a computer vs. the number of parallel processes in human brain

  18. References • Survey of the State of the Art in Human Language Technology, edited by Ronald A. Cole, 1996 • James Allen. Natural Language Understanding, 1995 • Raymond Kurzweil. When will HAL understand what we are saying? Computer Speech Recognition and Understanding. Taken from HAL’s Legacy, 1996

More Related