190 likes | 269 Views
The Simplest NL Applications: Text Searching and Pattern Matching. Read J & M Chapter 2. Searching for a Single String Using a Nondeterministic FSM. c o c o n u t.
E N D
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2
Searching for a Single StringUsing a Nondeterministic FSM c o c o n u t 1 2 3 4 5 6 7 8
Searching for a Single String Using the Boyer Moore Algorithm
Searching for Multiple Strings o c o s 2 3 4 5 6 l c o c o n u t 1 2 3 4 5 6 7 8 Example: lococonut
Converting to a Deterministic FSM o c o s 2 3 4 5 6 l c o c o n u t 1 2 3 4 5 6 7 8
Regular Expressions • Two different (but related) uses of the term: • Expressions that define all and only the regular languages • (aa ab ba bb)* • Expressions in a useful pattern language Matching ip addresses: S!<emphasis> ([0-9]+ (\ . [0-9]+) {3}) </emphasis> ! <inet> $1 </inet>! Finding doubled words: \< ([A-Za-z]+) \s+ \1 \>
REs: Syntax and Semantics Syntax The regular expressions over an alphabet are all strings over the alphabet {(, ), , , *} that can be obtained as follows: 1. and each member of is a regular expression. 2. If , are regular expressions, then so is . 3. If , are regular expressions, then so is . 4. If is a regular expression, then so is *. 5. If is a regular expression, then so is (). 6. Nothing else is a regular expression.
REs: Syntax and Semantics Regular expressions define languages via a semantic interpretation function we'll call L: 1. L() = and L(a) = {a} for each a 2. If , are regular expressions, then L() = L() L() = all strings that can be formed by concatenating to some string from L() some string from L(). 3. If , are regular expressions, then L() = L() L() 4. If is a regular expression, then L(*) = L()* 5. If () is a regular expression, then L( () ) = L() A language is regular if and only if it can be described by a regular expression. Note: Lis compositional.
The Importance of Compositionality What is the meaning of: Mary cooked the yujutes. Mary tyroked the yujutes.
Morphological Analysis • Read J & M Chapter 3 • Recognize words • Parse words
Morphological Parsing Goal: to represent the facts declaratively so that a single representation can be used for both recognition and generation. Note: ^ marks morpheme boundaries. # marks word boundaries.
From Lexical to Intermediate Note: All the transducers in the book are described as lexical:intermediate, but they can run the other direction.
From Intermediate to Surface For text, we need spelling rules. x e / s ^ ___ s # z Read this as “Replace as e in the context after the /.
Turning the Rule into a Transducer foxes xerox fox#sat
Disambiguation - Local Local ambiguities: # s# asses luxury
Disambiguation - Harder Sometimes additional knowledge is necessary: foxes: fox +N + PL or fox +V +SG Can we think of nouns that cannot also be verbs?
Search • For FSMs, we can build a deterministic machine. • In other cases, we will have to search: • Depth-first • Breadth-first – chart parsing S S VP VP NP PP NP NP V V PR N det N PREP DET N I hit the boy with a bat.