300 likes | 442 Views
Fall 2010. The Chinese University of Hong Kong. CSCI 3130: Automata theory and formal languages. Text search and closure properties. Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130. Text search. The program grep. grep -E regexp file.
E N D
Fall 2010 The Chinese University of Hong Kong CSCI 3130: Automata theory and formal languages Text search and closure properties Andrej Bogdanov http://www.cse.cuhk.edu.hk/~andrejb/csc3130
The program grep grep -E regexpfile Searches for the occurrence of patterns matching a regular expression cat|12 {cat, 12} union [abc] {a, b, c} shorthand for a|b|c [ab][12] {a1, a2, b1, b2} concatenation (ab)* {e, ab, abab, ...} star [ab]? {e, a, b} zero or one (cat)+{cat, catcat, ...}one or more [ab]{2}{aa, ab, ba, bb}{n} copies
Searching with grep Words containing savor or savour Words with 5 consecutive a or b grep –E `[ab]{5}` words grep –E `savou?r` words grabbable savorsome savory savour unsavored unsavoredly unsavoredness unsavorily unsavoriness unsavory outsavor savor savored savorer savorily savoriness savoringly savorless savorous grep –E `zo+zo+` words zoozoo
More grep commands . any symbol 5 a [a-z] anything in a range n f a 1 \< beginning of line s u f f e r 2 $ end of line l r e g u l a r 3 1 If a DFA is too hard, I do an ... e n 2CSCI 3130 homework makes me ... s t a r 4 3If a DFA likes it, it is ... 4$10000000 =$10? 5I study 3130 hard because it will make me ... grep –E `\<.ff.u..t` words
how do you look for... Words that start in cat and have another cat grep –E `\<cat.*cat` words Words with at least ten vowels? grep –E `([aeiouy].*){10}` words Words without any vowels? grep –E `\<[^AEIOUYaeiouy]*$` words Words with exactly ten vowels? [^R] does not contain R grep –E `\<[^AEIOUYaeiouy]* ([aeiouy][^AEIOUYaeiouy]*){10}$` words
How grep (could) work regularexpression NFA NFAwithout e DFA input text file differences in class in grep not allowed allowed [ab]?, a+, (cat){3} matches whole looks for pattern input handling accept/reject finds pattern output
Implementation of grep • How do you handle expressions like: [ab]? zero or one → ()|[ab] R? → e|R (cat)+ one or more → (cat)(cat)* R+ → RR* a{3}{n} copies → aaa ? R{n} → RR...R [^aeiouy] not containing any n times
Example • The language L of strings that end in 101 is regular • How about the language L of strings that do not end in 101? (0+1)*101
Example • Hint:w does not end in 101if and only if it ends in:or it has length 0, 1, or 2 • So L can be described by the regular expression 000, 001, 010, 011, 100, 110 or 111 (0+1)*(000+001+010+010+100+110+111) + e + (0 + 1) + (0 + 1)(0 + 1)
Complement • The complementL of a language L is the set of all strings that are not in L • Examples (S = {0, 1}) • L1 = all strings that end in 101 • L1= all strings that do not end in 101= all strings end in 000, …, 111 or have length 0, 1, or 2 • L2 = 1* = {e, 1, 11, 111, …} • L2 = all strings that contain at least one 0 = (0 + 1)*0(0 + 1)*
Example • The language L of strings that contain101 is regular • How about the language L of strings that do not contain 101? (0+1)*101(0+1)* You can write a regular expression, but it is a lot of work!
Closure under complement • To argue this, we can use any of the equivalent definitions for regular languages: • The DFA definition will be most convenient • We assume L has a DFA, and show L also has a DFA If L is a regular language, so is L. regularexpression NFA DFA
Arguing closure under complement • Suppose L is regular, then it has a DFA M • Now consider the DFA M’ with the accepting and rejecting states of Mreversed accepts L accepts strings not in L this is exactly L
Food for thought NO! • Can we do the same thing with an NFA? 0, 1 1 0 (0+1)*10 q0 q1 q2 0, 1 (0+1)* 1 0 q0 q1 q2
Intersection • The intersectionL L’ is the set of strings that are in both L and L’ • Examples: • If L, L’ are regular, is L L’ also regular? L L’ = L = (0 + 1)*11 L’ = 1* 1*11 L L’ = ∅ L = (0 + 1)*10 L’ = 1*
Closure under intersection If L and L’ are regular languages, so is L L’. • To argue this, we can use any of the equivalent definitions for regular languages: regularexpression NFA DFA Suppose L and L’ have DFAs, call them M and M’ Goal: Construct a DFA (or NFA) for L L’
An example M 1 1 0 r1 r0 0 0 0 L = even number of 0s L’ = odd number of 1s M’ 1 s1 s0 1 1 r0, s0 r0, s1 1 0 0 0 0 1 r1, s0 r1, s1 1 L L’ = even number of 0s and odd number of 1s
Closure under intersection M and M’ DFA for L L’ states Q = {r1, ..., rn} F for M Q × Q’ = {(r1, s1), (r1, s2), ..., (r2, s1), ..., (rn, sn’)} F’ for M’ Q’ = {s1, ..., sn’} start state ri for M sj for M’ (ri,sj) accepting states F × F’ = {(ri, sj): ri ∈F, sj ∈F’} Whenever M is in state ri and M’ is in state sj, the DFA for L L’ will be in state (ri,sj)
Closure under intersection M and M’ DFA for L L’ a transitions in M ri,si’ rj,sj’ in M’ a a si’ ri sj’ rj
Reversal • The reversalwR of a string w is w written backwards • The reversalLRof a languageL is the language obtained by reversing all its strings w = cave wR = evac L = {cat, dog} LR = {tac, god}
Reversal of regular languages • L = all strings that end in 101 is regular • How about LR? • This is the language of all strings beginning in 101 • Yes, because it is represented by (0+1)*101 101(0+1)*
Closure under reversal • How do we argue? If L is a regular language, so is LR. regularexpression NFA DFA
Arguing closure under reversal • Take a regular expression E for L • We will show how to reverse E • A regular expression can be of the following types: • The special symbols and e • Alphabet symbols like a, b • The union, concatenation, or star of simpler expressions
Proof of closure under reversal regular expression E reversal ER Æ Æ e e a (alphabet symbol) a E1 + E2 E1R + E2R E1E2 E2RE1R E1* (E1R)*
A question L = {cat, dog} LDUP = {ww: w L} Ex. LDUP = {catcat, dogdog} If L is regular, is LDUP also regular? ? regularexpression NFA DFA
A question • Let’s try with regular expression: • Let’s try with NFA: L = {a, b} q0 q1 NFA for L NFA for L LDUP = LL LDUP = {aa, bb} LL = {aa, ab, ba, bb}
An example • Let’s try to design an NFA for LDUP L = 0*1 is regular LDUP = {1, 01, 001, 0001, ...} LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n ≥ 0}
An example 0 0 0 0 1 1 1 1 01 1 001 0001 LDUP = {11, 0101, 001001, 00010001, ...} = {0n10n1: n ≥ 0}