220 likes | 244 Views
Statistical Natural Language Processing. What is NLP?. Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical and practical issues in the design and implementation of computer systems for processing human languages
E N D
What is NLP? • Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical and practical issues in the design and implementation of computer systems for processing human languages • It is an interdisciplinary field which draws on other areas of study such as computer science, artificial intelligence, linguistics and logic
Applications of NLP • natural language interfaces to databases • programs for classifying and retrieving documents by content • explanation generation for expert systems • machine translation • advanced word-processing tools
What makes NLP a computational challenge? • Ambiguous nature of Natural Language. • There are varied applications for language technology • Knowledge representation is a difficult task. • There are different levels of information encoded in our language
What is statistical NLP? • Statistical NLP aims to perform statistical inference for the field of NLP • Statistical inference consists of taking some data generated in accordance with some unknown probability distribution and making inferences.
Motivations for Statistical NLP • Cognitive modeling of the human language processing has not reached a stage where we can have a complete mapping between the language signal and the information contents. • Complete mapping is not always required. • Statistical approach provides the flexibility required for making the modeling of a language more accurate.
Idea behind Statistical NLP • View language processing as a noisy channel information transmission. • The approach requires a model that characterizes the transmission by giving for every message the probability of the observed output
Statistical Modeling and Classification • Primitive acoustic features • Quantization • Maximum likelihood and related rules • Class conditional density function • Hidden Markov Model Methodology
Details…. Primitive acoustic features are used to estimate the speech spectrum on the basis of its statistical properties. By means of quantization a typical speech signal can be represented as a sequence of symbols and can be mapped using statistical decision rules into a multidimensional acoustic feature space, thus classifying the signal.
Maximum Likelihood Although there is no direct method for computing the probability of a phonetic unit given its acoustic features,we can use Bayes rule to estimate the probability of a phonetic class given its features from the likelihood of the features given the class. This method leads to the maximum likelihood classifier which assigns an unknown vector to that class whose probability density function conditioned on the class has the maximum value. Another variant of the maximum likelihood methodology is clustering.
Hidden Markov Models A Hidden Markov Model, is a set of states (lexical categories in our case) with directed edges labeled with transition probabilities that indicate the probability of moving to the state at the end of the directed edge, given that one is now in the state at the start of the edge. The states are also labeled with a function which indicates the probabilities of outputting different symbols if in that state (while in a state, one outputs a single symbol before moving to the next state). In our case, the symbol output from a state/lexical category is a word belonging to that lexical category.
Conditional Class Density Function All statistical methods of speech recognition depend on the class conditional density function. These, in turn, depend on the existence of a sufficiently large, correctly labeled training set and well understood statistical estimation techniques
How does statistics help • Disambiguation may be achieved by using stochastic context free grammars • It helps in providing degrees of grammaticality • Naturalness • Structural preference • Error Tolerance
Example using stochastic CFG for example consider the sentence “ John Walks “ The grammar is as follows : 1 S -> NP V 0.7 2 S -> NP 0.3 3 NP -> N 0.8 4 NP -> N N 0.2 5 N -> John 0.6 6 N -> Walks 0.4 7 V -> Walks 1.0 The numbers on the right represent the weights for each rule.The weight of the analysis is the product of the weights of the rules used in the derivation. Predicting the right sentence that is perceived is based on these weights.
Degrees of grammaticality • Traditional approaches to NLP do not accommodate gradations of grammaticality. A sentence is either correct or not. • In some cases acceptability may vary with the structure and context of the sentence.
Structural Preference Consider the sentence “ The emergency crews hate most is domestic violence.” The correct interpretation is: “The emergency [that the crews hate most] is domestic violence.” These preferences can be seen more as structural preferences rather than parsing preferences. Statistical approaches can easily handle such structural preferences.
Error Tolerance • A remarkable property of human language comprehension is error tolerance. • Many sentences that the traditional approach classifies as ungrammatical can actually be interpreted by statistical NLP techniques.
Conclusions • Free and commercial software is now available that provides a lot of NLP features. (e.g. Microsoft XP has a speech recognition software by which users can control menus and execute commands) • A lot of research is going into developing new applications and investigating new techniques and approaches that will make Statistical NLP more feasible in the near future.