170 likes | 327 Views
CSE 303 Concepts and Tools for Software Development. Richard C. Davis UW CSE – 10/6/2006 Lecture 5 – Regular Expressions. Administravia. Homework 1 is due!. Tip of the Day. Viewing Multiple buffers in Emacs C-x 2 : Make two buffers visible C-x o : Switch cursor to “other” buffer
E N D
CSE 303Concepts and Tools for Software Development Richard C. DavisUW CSE – 10/6/2006 Lecture 5 – Regular Expressions
Administravia • Homework 1 is due! CSE 303 Lecture 5
Tip of the Day • Viewing Multiple buffers in Emacs • C-x 2 : Make two buffers visible • C-x o : Switch cursor to “other” buffer • Careful! This also switches from mini-buffer! • C-x 1 : Make current buffer the only buffer CSE 303 Lecture 5
Where are we? • Tons of info on the shell • Pseudo-programming language • Files, users, and processes • You know just enough to get around • Coming up • More powerful ideas CSE 303 Lecture 5
Today • Regular Expressions • Find strings matching a pattern • Recurring theme in CS theory • Crisp, elegant, and useful idea • Powerful tool for computer science • Less educated hackers don’t tend to know • Other Utilities • find, diff, wc CSE 303 Lecture 5
Types of Regular Expressions • “Globbing” = filename expansion • “Regular Expressions” = pattern matching • In CSE 322 (finite automata) • In grep • In egrep • Others… CSE 303 Lecture 5
What are Regular Expressions? • Recursive definition: p = … • a, b, c : matches a single character • p1p2:matches ifp1comes just before p2 • (p1|p2):matches eitherp1or p2 • p*:matches 0 or more instances of p • Example • Match abcbdgef against bd(e|g)* CSE 303 Lecture 5
Why This Language? • Amazing Facts • Such patterns are very easy to match • Space/time scale nicely with length of input and regexp • Know space needed before matching! • Can tell if two regexps match same strings • For more of this, see CSE 322 CSE 303 Lecture 5
More Regular Expressions • Syntactic Conveniences • p+=pp*(one or more p’s) • p?=(|p) (zero or one p) • [zd-h] =z | d | e | f | g | h • [^A-Z] = … (any not in set) • . = … (any character) • p{n}= … (n instances of p) • p{n,}= … (n or more p’s) • p{n,m}= … (from n to n+m instances of p) CSE 303 Lecture 5
More Regular Expressions • Other Conveniences • [:alnum:] = Letters and Numbers • [:alpha:] = Letters • [:digit:] = Numbers • [:lower:] = Lowercase letters • [:print:] = Printable characters • [:punct:] = Punctuation Marks • [:space:] = Whitespace CSE 303 Lecture 5
grep finds patterns in files • grep p [file] • Reads each line from stdin (or file) • If match, print line on stdout • Technically, matches against .*p.* • Get rid of 1st.* with ^ • Get rid of 2nd.* with $ • Example: ^p$ matches whole line exactly CSE 303 Lecture 5
Regular Expression Gotchas • Check syntax of each program • grep uses \| \+ \? \{ \} • egrep uses | + ? { } • Use quotes to avoid shell substitutions • Escape special characters with \ • \. and . are very different! • Rules different inside [ ] (\ is literal) CSE 303 Lecture 5
Reusing Previous Matches • Can refer to matches of previous ( ) • \n refers to match of nth( ) • Example: Double-words ^([a-zA-Z]+)\1$ • Very useful when doing substitution • We’ll see this next time CSE 303 Lecture 5
Other Utilities • Put these in your arsenal • find : search a directory tree for anything • Linux Pocket Guide gives nice summary • find . –type f – name “foo.*” – exec grep “regexp” “{}” \; - print • diff : compare two files or directories • wc : count the bytes, words, lines in a file CSE 303 Lecture 5
Summary • Regular Expressions • Powerful and useful idea • Powerful tool for computer science • Less educated hackers don’t tend to know • Other Utilities • find • diff • wc CSE 303 Lecture 5
Reading • Linux Pocket Guide • p71-74 (grep) • p66-68 (find) • p86-88 (diff) • p58 (wc) CSE 303 Lecture 5
Next Time Other scripting tools • sed : For changing text files • awk : More powerful text editing • General-purpose scripting languages • perl • python • ruby CSE 303 Lecture 5