1 / 20

NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms. Regular Expressions

delano
Download Presentation

NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NL Grammar Hierarchies Regular Expressions, Finite State Automata, Markov Algorithms Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  2. Regular Expressions Regular expressions consist of constants and operators that denote sets of strings and operations over these sets, respectively. The following definition is standard, and found as such in most textbooks on formal language theory. Given a finite alphabet , the following constants are defined: (empty set)  denoting the set (empty string)  denoting the "empty" string, with no characters at all. (literal character) a in  denoting a character in the language. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  3. Regular Expressions The following operations are defined: (concatenation) RS denoting the set { ab | a in R and b in S }. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}. (alternation) R | S denoting the set union of R and S. For example {"ab", "c"}|{"ab", "d", "ef"} = {"ab", "c", "d", "ef"}. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  4. Regular Expressions (Kleene star) R* denoting the smallest superset of R that contains e and is closed under string concatenation. This is the set of all strings that can be made by concatenating zero or more strings in R. For example, {"ab", "c"}* = {e, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", ... }. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  5. Regular Expressions (summary) (empty set) ∅ denoting the set ∅. (empty string) ε denoting the set containing only the "empty" string, which has no characters at all. (literal character) a in Σ denoting the set containing only the character a. The following operations are defined: (concatenation) RS denoting the set { αβ | α in R and β in S }. For example {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}. (alternation) R | S denoting the set union of R and S. For example {"ab", "c"}|{"ab", "d", "ef"} = {"ab", "c", "d", "ef"}. (Kleene star) R* denoting the smallest superset of R that contains ε and is closed under string concatenation. This is the set of all strings that can be made by concatenating any finite number (including zero) of strings from R. For example, {"0","1"}* is the set of all finite binary strings (including the empty string), and {"ab", "c"}* = {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", ... }. nothing else is a regular expression over ∑ Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  6. Finite State Automata Automata are models of computation: they compute languages. A finite-state automaton is a five-tuple {Q, q0, ∑, , F}, where ∑ is a finite set of alphabet symbols, Q is a finite set of states, q0 ∈ Q is the initial state, F ⊆ Q is a set of final (accepting) states and  : Q × ∑ × Q is a relation from states and alphabet symbols to states. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  7. Finite State Automata • Example: Finite-state automaton • Q = {q0, q1, q2, q3} • ∑ = {c, a, t, r} • F = {q3} •  = {<q0, c, q1>, <q1, a, q2>, <q2, t, q3>, <q2, r , q3>} Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  8. Finite State Automata • The reflexive transitive extension of the transition relation  is a new relation, ˆ, defined as follows: • for every state q ∈ Q, (q, , q) ∈ ˆ • for every string w ∈ ∑∗ and letter a ∈ ∑, if (q,w, q′) ∈ ˆ and (q′, a, q′′) ∈  then (q,w · a, q′′) ∈ ˆ. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  9. Finite State Automata • Example: Paths For the finite-state automaton: ˆ is the following set of triples: <q0, ǫ, q0>, <q1, ǫ, q1>, <q2, ǫ, q2>, <q3, ǫ, q3>, <q0, c, q1>, <q1, a, q2>, <q2, t, q3>, <q2, r , q3>, <q0, ca, q2>, <q1, at, q3>, <q1, ar , q3>, <q0, cat, q3>, <q0, car , q3> Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  10. Finite State Automata An extension: -moves. • The transition relation  is extended to:  ⊆ Q × (∑ ∪ {}) × Q Example: Automata with -moves - an automaton accepting the language {do, undo, done, undone}: Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  11. Formal language theory – definitions If L is a language then the reversal of L, denoted LR, is the language {w | wR ∈ L}. If L1 and L2 are languages, then L1 · L2 = {w1 · w2 | w1 ∈ L1 and w2 ∈ L2}. Example: Language operations Let L1 = {i, you, he, she, it, we, they}, L2 = {smile, sleep}. Then L1R = {i, uoy, eh, ehs, ti, ew, yeht} and L1 · L2 = {ismile, yousmile, hesmile, shesmile, itsmile, wesmile, theysmile, isleep, yousleep, hesleep, shesleep, itsleep, wesleep, theysleep}. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  12. Formal language theory – definitions If L is a language then L0 = {}. Then, for i > 0, Li = L · Li−1. Example: Language exponentiation Let L be the set of words {bau, haus, hof, frau}. Then L0 = {}, L1 = L and L2 = {baubau, bauhaus, bauhof, baufrau, hausbau, haushaus, haushof, hausfrau, hofbau, hofhaus, hofhof, hoffrau, fraubau, frauhaus, frauhof, fraufrau}. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  13. Formal language theory – definitions The Kleene closure of L and is denoted L∗ and is defined as ∞i=0 Li . L+ = ∞i=0 i=1 Li Example: Kleene closure Let L = {dog, cat}. Observe that L0 = {}, L1 = {dog, cat}, L2 = {catcat, catdog, dogcat, dogdog}, etc. Thus L∗ contains, among its infinite set of strings, the strings , cat, dog, catcat, catdog, dogcat, dogdog, catcatcat, catdogcat, dogcatcat, dogdogcat, etc. The notation for ∗ should now become clear: it is simply a special case of L∗, where L = ∑ Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  14. Markov Algorithms A Markov Algorithm is a finite sequence P1, P2,...,Pn of Markovproductions to be applied to strings in a given alphabet according to the following rules. Let S be a given string. The sequence is searched to find the first production Pi whose antecedent occurs in S. If no such production exists, the operation of the algorithm halts without change in S. If there is a production in the algorithm whose antecedent occurs in S, the first such production is applied to S. If this is a conclusive production, the operation of the algorithm halts without further change in S. If this is a simple production, a new search is conducted using the string S' into which S has been transformed. If the operation of the algorithm ultimately ceases with a string S*, we say that S* is the result of applying the algorithm to S. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  15. Markov Algorithms Example: Take the alphabet to be {a, b, c, d}. The algorithm is given below. Algorithm M1 M11: [conclusive] a d → d c M12: [simple] b a → W M13: [simple] a → b c M14: [simple] b c → b b a M15: [simple] W → a Taking S = “dcb” we apply the algorithm by M15 dcb becomes adcb by M11 adcb becomes dccb and halts. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  16. Markov Algorithms Example: Let  be a marker not in the alphabet. If S is a string in the alphabet, the result of applying algorithm M3 to S is the string SA. Algorithm M3 M31: [interchange]  → , A  member of alphabet M32: [conclusive]  → A M33: W →  Since S initially does not contain , the third production is then used to move  past the symbols in S. If S contains n occurrences of symbols, then after n steps we obtain the string S. At this point the first production no longer applies, and the second production produces SA. Since this production is conclusive, the string SA is then the result. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  17. Markov Algorithms In the preceding example, we have introduced a new notation. Namely, in the first production we have used the variable  which ranges over the symbols in the alphabet. Thus the first line is not really a production, but rather a production schema, denoting all the productions which can be obtained by substituting symbols of the alphabet for . Because of the manner in which the Markov algorithms are used, the order in which the productions are written is vital. If the first two lines of algorithm M3 were interchanged, the result would be to transform S into AS, rather than into SA, and the productions represented by→ would never be used. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  18. Markov Algorithms Example: Another procedure which is quite common is that of reversing a string of characters. We do this by moving the first character to the end as before, then moving the next character down to the position just preceding the first character, and so on. Markers:,  Algorithm M10 M101:  → W , f members of the alphabet M102: →  M103: f → f M104: →  M105: W→  Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  19. Markov Algorithms Illustrating this algorithm on the string “ABCD” we have by M105 =>  A B C D by M103 => B  A C D by M103 => B C  A D by M103 => B C D  A by M104 => B C D  A by M105 =>  B C D  A by M103 => C  B D  A by M103 => C D  B  A by M102 => C D  B A by M105 =>  C D  B A by M103 => D  C  B A by M102 => D  C B A by M105 =>  D  C B A by M102 =>  D C B A by M105 => D C B A by M101 => D C B A Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

  20. Other Concluding Remarks A PSYCHOLOGICAL TIP Whenever you're called on to make up your mind, and you're hampered by not having any, the best way to solve the dilemma, you'll find, is simply by spinning a penny. No -- not so that chance shall decide the affair while you're passively standing there moping; but the moment the penny is up in the air, you suddenly know what you're hoping. Instructor: Nick Cercone - 3050 CSEB - nick@cse.yorku.ca

More Related