710 likes | 832 Views
Software I: Utilities and Internals. Lecture 4 – Regular Expressions, grep and sed * Modified from Dr. Robert Siegfried original presentation. What Is A Regular Expression?. A regular expression is a pattern consisting of a sequence of characters that is matched against text.
E N D
Software I: Utilities and Internals Lecture 4 – Regular Expressions, grepand sed * Modified from Dr. Robert Siegfried original presentation
What Is A Regular Expression? • A regular expression is a pattern consisting of a sequence of characters that is matched against text. • Regular expressions give us a way of recognizing words, numbers and operators that appear as part of a larger text so the computer can process them in a meaningful and intelligent way.
What are Atoms? • Regular expressions consist of atoms and operators. • An atom specifies what text is to be matched and where it can be found. • There are five types of atoms that can be found in text: • Single characters • Dots • Classes • Anchors • Back references
Single Characters • The most basic atom is a single character; when a single character appears in a regular expression, that character must appear in the text for there to be a successful match. • Example (String is "Hello"; Regular Expression is "l") • The match is successful because "l" appears in "Hello" • If there regular expression had been "s", there would be no match.
Dot • A dot (".") matches any character except new line ('\n'). • Example • a. matches aa, ab, ac, ad, aA, aB, a3, etc. • . will match any character in HELLO, H. will match the HE in HELLO, h. matches nothing in HELLO.
Class • A class consists of a set of ASCII character, any one of which matches any character in the text. • Classes are written with the set of characters contained within brackets. • Example • [ABL] matches either "L" in HELLO.
Ranges and Exceptions in Classes • A range of characters can be used in a class: • [a-d] or [A-Za-z] • Sometimes is it easier to specify what characters DON'T appear. This is done using exclusion (^). • Examples • [^aeiou] specifies anything but a vowel. • [^0-9] specfies anything but a digit.
Anchors • Anchors line up the pattern with a particular part of the string: • ^ Beginning of the line • $ End of the line • \< Beginning of a word • \> End of a word
Anchors- Examples • Sample text: One line of text\n • ^One Matches • text$ Matches • \<line Matches • \>line Does not match • line\> Matches • f\>Matches PEPPER@panther:~/270$ grep 'line\>' vitest1 one line of text
What are Operators? • Operators provide us with a way to combine atoms to form larger and more powerful regular expressions. • Operators play the same role as mathematical operators play in algebraic expressions. • There are five types of operators that can be found in text: • Sequence • Alternation • Repetition • Group • Save
Sequence • No symbol is used for the sequence operator; all you need is to have two atoms appear in sequence. • We can match the string CHARACTER with the pattern ACT because we find the sequence ACT in our string.
Sequence - Examples • dog – matches the character sequence "dog" • a..b – matches a, any two characters, then b • [2-4][0-9] – matches a number between 20 and 49. • ^$ - matches a blank line • ^.$ - matches a line with only one character • [0-9] – [0-9] – matches two digits with a dash in between.
Alternation • The alternation operator (|) defines one or more alternatives, either of which can appear in the string. • Examples • UNIX|unix matches either UNIX or unix • Ms|Mrs|Miss matches Ms, Mrs or Miss • FE|EL matches HELLO because one of the alternatives matches it.
Repetition • Repetition refers to a definite or indefinite number of times that one or more characters can appear. • The most common forms of repetition use three "short form" repetition operators: • * - zero or more occurrences • + - one or more occurrences • ? - zero or one occurrences
* - Examples • BA* - B, BA, BAA, BAAA, BAAAA • B.* - B, BA, BB, BC, BD, …, BAA, BAB, BAC, … • .* - any sequence of zero or more characters
+ - Examples • BA+ - BA, BAA, BAAA, BAAA, … • B.+ - BA, BB, BC, BD, …, BZ, BAA, BAB, … • .+- any sequence of one or more characters
? - Examples • d? - zero or one d • [0-9]? – zero or one digit • [^A-Z]? – zero or one character except a capital letter • [A-Za-z]? – zero or one letter
General Cases of Repetition • Repetition can be stated in more general terms using a set of escaped brackets containing two numbers separated by a comma • Example • B\{2, 5\} would match BB, BBB, BBBB, BBBBB • The minimum or maximum value can be omitted: • CA\{5\} matchesCAAAAA • CA\{2, \} matches CAA, CAA, CAAA,… • CA \{, 5\} matches CA, CAA, CAAA, CAAAA, CAAAAA (escape so the braces are interpreted as char)
Group Operator • The group operator is a pair of parentheses around a group of characters, causing the next operator to apply to the group, not just a single character: • Example • AB*C - matches AC, ABC, ABBC, ABBC, … • \(AB\)*C – matches C, ABC, ABABC, ABABC, … (escape so the parentheses are interpreted as char)
Practice Regular Expressions • http://www.zytrax.com/tech/web/regex.htm#intro • Search for Regular Expression - Experiments and Testing • Tells you how many matches it finds • Backreference - use \1 to get the first match, and \2 to get the second match
What is grep? • grep(general regular expression program) allows the user to print each line in a text file that contains a particular pattern.
What is grep? • The name grep stands for "general regular expression program." • The general format is grep pattern filenames • The input can be from files or from stdin. • grep –n variable *.[ch] prints every line in every c source file or header file containing the word variable (and prints a line number).
Examples of grep grep From $MAIL • Print message headers in the mailbox grep From $MAIL | grep –v mary • which ones are not from Mary grep –i mary $HOME/lib/phone-book • Find Mary's phone-book. who | grep mary • Is Mary logged in? ls | grep –v temp • List all the files without temp in their name
Options for grep • -i - ignore case – treat upper and lower case the same. • -n – provide line numbers • -v - reverse – print lines without the pattern. • -c – provide a count of the lines with the pattern, instead of displaying these lines.
grep Patterns • grep patterns can be more complicated: • grep c* 0 or more occurrences of c in the pattern • grep sieg* /etc/patterns Check the password file for sie, sieg, siegg, siegggg, etc. • grep [abc] Check for an occurrence of any of these three characters. • grep [br]ob /etc/passwd Look for bob or rob in the password file. • grep [0-9]* hithere.c Look for numbers in the program.
^ And $ In A grep Pattern • The metacharacters ^ and $ anchor text to the beginning and end of lines, respectively: • grep From $MAIL Check mail for lines containing From • grep '^From' $MAIL Check mail for lines beginning with From • grep ';$' hello.c Display lines ending with ;
Other Pattern Metacharacters • A circumflex inside the brackets causes grep to reverse its meaning grep [^0-9] hithere.c Display the lines without digits • A period represents any single character ls –l | grep '^d' List the subdirectories ls –l | grep '^.......rw' List files others can read and write (the seven dots are for the file type and other permissions)
* • x* - 0 or more xs • .* - 0 or more of any character • .*x – anything followed by an x. • xy* - x followed by zero or more ys The * applies to only one character. xy, xyy, xyyy, etc. NOT xy, xyxy, xyxyxy, etc. [a-zA-Z]* - 0 or more letters [a-zA-Z][a-zA-Z]* - 1 or more letters
grep – Some More Examples • grep '^[^:]*::' /etc/passwd Lists users without a password – it looks from the beginning of the line for non-colons followed by two consecutive colons. • w –h | grep days who without a heading – lists everyone who has been idle for more than 1 day. • w –h | grep days | cut –c1-8 cuts out some of the output (includes only columns 1 through 8) • grep –l float * lists only the file names for the files in this subdirectory containing the string float.
grep – Some More Examples [SIEGFRIE@panther c]$ cat > memo data is correct before we publish it. I thought you would have known by now. [SIEGFRIE@panther c]$ grep -w now memo I thought you would have known by now. [SIEGFRIE@panther c]$ cat >errors 0[0-9].e[0-9]* [SIEGFRIE@panther c]$ cat >sketch 00.e8 9/12 [SIEGFRIE@panther c]$ grep -f errors sketch 00.e8 [SIEGFRIE@panther c]$
grep Family • The grep family includes 2 additional variations of grep: • fgrep – fast grep uses only sequence of characters in a pattern, but works more quickly than grep. • egrep – extended grep handles a wider array of regular expressions.
fgrep – Examples SIEGFRIE@panther:~$ cat raven Once upon a midnight dreary, while I pondered, weak and weary, Over many a quaint and curious volume of forgotten lore. While I nodded, nearly napping, suddenly there came a tapping, As of some one gently rapping, rapping at my chamber door. ..Tis some visiter,. I muttered, .tapping at my chamber door. Only this and nothing more.. … … And the Raven, never flitting, still is sitting, still is sitting On the pallid bust of Pallas just above my chamber door; And his eyes have all the seeming of a demon.s that is dreaming, And the lamp-light o.er him streaming throws his shadow on the floor; And my soul from out that shadow that lies floating on the floor Shall be lifted.nevermore! SIEGFRIE@panther:~$
SIEGFRIE@panther:~$ fgrep Raven raven In there stepped a stately Raven of the saintly days of yore; Ghastly grim and ancient Raven wandering from the Nightly shore. Quoth the Raven .Nevermore.. But the Raven, sitting lonely on the placid bust, spoke only But the Raven still beguiling all my fancy into smiling, Quoth the Raven .Nevermore.. Quoth the Raven .Nevermore.. Quoth the Raven .Nevermore.. Quoth the Raven .Nevermore.. And the Raven, never flitting, still is sitting, still is sitting
SIEGFRIE@panther:~$ fgrep -v Raven raven Once upon a midnight dreary, while I pondered, weak and weary, Over many a quaint and curious volume of forgotten lore. While I nodded, nearly napping, suddenly there came a tapping, As of some one gently rapping, rapping at my chamber door. ..Tis some visiter,. I muttered, .tapping at my chamber door. Only this and nothing more.. … … And my soul from out that shadow that lies floating on the floor Shall be lifted.nevermore! SIEGFRIE@panther:~$
egrep [SIEGFRIE@panther c]$ cat alphvowels ^[^aeiou]*a[^aeiou]*e[^aeiou]*o[^aeiou]*u [^aeiou]*$ [SIEGFRIE@panther c]$ egrep –f alphvowels dict | 3 abstemious abstemious abstentious achelious acheirous acleistous affectious annelidous arsenous … … • egrepextends the capabilities with three additional metacharacters: ? + | r+ -1 or more occurrences ofr r? – 0 or more occurrences ofr r1 | r2 – Eitherr1 or r2 • egrep 'cookie|donut' oreo
Searching for File Content SIEGFRIE@panther:~$ ls | grep flea fleas fleass fleast fleawrite newfleas SIEGFRIE@panther:~$ ls | grep 'fl*' 160L2Handout.pdf 160l4notes.pdf 270cl1.pdf binfile.c BlindOpportunities.pdf filename
final find fl fleas fleass fleast fleawrite myfile mystuff newfleas test.f under.f yourstuff SIEGFRIE@panther:~$
Searching for Files SIEGFRIE@panther:~$ ls | grep flea fleas fleass fleast fleawrite newfleas SIEGFRIE@panther:~$ ls | grep 'fl*' 160L2Handout.pdf 160l4notes.pdf 270cl1.pdf binfile.c BlindOpportunities.pdf filename
sed – The Stream Editor • The basic command is: sed 'list of editing commands' filename • sed does not alter the input file unless output is redirect there. This sed '…' file >file is a really bad idea; it is much better to store the results in a temporary file. (use –i instead) • sed outputs each line automatically, but use –n combined with p(rint) option to simulate grep. • To use the extended regex, use -r
sed – The Stream Editor [SIEGFRIE@panther ~]$ sed 's/knees/feet/g' <mystuff\ >yourstuff [SIEGFRIE@panther ~]$ cat mystuff This is a test of the emergency programming system. If this were a real emergency, bend forward and kiss your knees goodbye [SIEGFRIE@panther ~]$ cat yourstuff This is a test of the emergency programming system. If this were a real emergency, bend forward and kiss your feet goodbye [SIEGFRIE@panther ~]$
sed patterns • sed patterns almost always need quotes because their metacharacters usually have a special meaning to the shell. • du – estimate disk usage • du -a – include files as well as directories.
du and sed – An Example : du -a g* 4 greptest2.txt 4 greptest3.txt 4 greptest.c 4 greptest.mod : du -a g* | sed 's;.*g;g;' greptest2.txt greptest3.txt greptest.c greptest.mod : : du -a g* | sed 's;[^.]*\.;\.;'
who And sed PEPPER@panther:~/271$ who coreypts/0 2014-08-28 08:38 (10.80.4.131) PEPPER pts/1 2014-09-01 08:49 (pool-96-246) PEPPER@panther:~/271$ who | sed 's/ .*/ /' corey PEPPER
What are scripts? • A script is a series of editing commands placed in a file that can be in a sedcommand. • Example SIEGFRIE@panther:~$ cat fleas Great fleas have little fleas upon their backs to bite 'em And little fleas have lesser fleas and so on ad infinitum. And the great fleas themselves, in turn, have greater fleas to go on; While these again have greater still, and great still and so on.
ind • We are going to implement another filter called ind which will indent its input one tab stop and start with A. • Our initial implementation is sed 's/^/A\t/' raven This places tabs on lines that would otherwise be blank. • We can avoid this problem by writing sed '/^$/!s/^/A\t/' raven It substitutes on all lines EXCEPT those with no content. (! Is negation)
ind2 [SIEGFRIE@panther ~]$ cat > bin/ind2 sed 's/^/ / 3q' [SIEGFRIE@panther ~]$ sed 3q fleas Great fleas have little fleas upon their backs to bite 'em And little fleas have lesser fleas [SIEGFRIE@panther ~]$ cat fleas | ind2 Great fleas have little fleas upon their backs to bite 'em And little fleas have lesser fleas [SIEGFRIE@panther ~]$
sed Commands from Files • sed commands can be taken from files by writing: sed –f cmdfile • Number selectors can now be used for printing, deleting and substituting.
sed–f – An Example [SIEGFRIE@panther ~]$ cat fleawrite s/fleas/flea/ /fleas/d [SIEGFRIE@panther ~]$ sed -f fleamixup fleas upon their backs to bite 'em and so on ad infinitum. And the great flea themselves, in turn, have greater flea to go on; While these again have greater still, and great still and so on Same as: sed -e s/fleas/flea/ -e /fleas/d fleas
sed–n –f – An Example • sed –n suppresses the automatical printing PEPPER@panther:~/271/grept$ sed -n 's/on/ON/gp' fleas upON their backs to bite 'em and so ON ad infinitum. have greater fleas to go ON; and great still and so ON