Perl Regular Expressions

Perl Regular Expressions

Things Perl Can Do Easily with Regular Expression • Pattern matching • Find out if a string contains some specific pattern. • Substitution • Replace parts of a string with other strings. • Translation • Replace one character with another character. • Split function • Split a string into substrings

Pattern Matching • if ($string =~ m/pattern/) • The result is TRUE ($string contains substring pattern) • The result is false ($string does not contains substring pattern) $message=“I like C, C++, Perl, Java, and Python.” if ($message =~ m/Perl/) { print “I see Perl\n”; } else { print “I did not see Perl\n”; };

Pattern Matching to Verify Email @mail = ('peter@verhas.com‘,'hab&cnn.com’); foreach $address (@mail) { if($address =~ m/@/) { print “$address seems to be a good eMail address\n"; } else { print "$address is a bad address\n"; } } OUTPUT: peter@verhas.com seems to be a good eMail address hab&cnn.com is a bad address

Substitution • String substition • In string $string, replace every "Bill Clinton" with an “Bush”: $string =~ s/Bill Clinton/Bush/;

Translation • Replace one character with another character • In $string, make all vowels upper case: $string =~ tr/[a,e,i,o,u,y]/[A,E,I,O,U,Y]/; • Change every letter in $string to upper case: $string =~ tr/[a-z]/[A-Z]/; • Change every letter in $string to lower case $string =~ tr/[A-Z]/[a-z]/;

Split function • Split a string into substrings • Split on character: $data = 'Becky Lincoln,25,female,South Bend'; @values = split(/,/, $data); • Split on string: $data = ‘Care Bears~~~10:30am~~~Saturday~~~CBS'; @values = split(/~~~/, $data);

Wildcard Character and Repetition • Wildcard character \w Match "word" character (alphanumeric plus "_") \W Match non-word character \s Match whitespace character (tab or space key) \d Match digit character \t Match tab \n Match newline • Repitition + Match 1 or more times * Match 0 or more times {n} Match exactly n times

Task 1 • Write a Perl program that can • Convert the following string to an array of word • Convert upper case letters into lower case • Display each word in one line “Hello There Nice to SEE Everyone”

Perl Program for Task 1 #!/usr/bin/perl $data="Hello There Nice to SEE Everyone"; $data =~ tr/[A-Z]/[a-z]/; @words=split(/\s/, $data); # @words=split(/ /, $data); foreach $word (@words) { print "$word\n"; }

Split function (continued) • Split a string into words • Depending on how you define a word • We define that words are separated by non-word characters • “Hi there, Java---API!”is considered as four words string = “Hi there, Java---API!” ; @words = split(/\W+/, $string); • @words now: (Hi, there, Java, API)

Task 2 • Write a Perl program that can • Convert the following string to an array of word • Convert upper case letters into lower case • Display each word in one line “Hello There, (It is) Nice to SEE Everyone!”

Perl Program for Task 2 #!/usr/bin/perl $data="Hello There, (It is) Nice to SEE Everyone!"; $data =~ tr/[A-Z]/[a-z]/; @words=split(/\W+/, $data); foreach $word (@words) { print "$word\n"; }

Hash • Key Value • Key must be unique • Examples: • Student-ID  Student GPA • Bank_account_num  Balance • Name  Age • Name  Birthday Student_ID Student_GPA

Hash (red is key, blue is value) • Fill the hash %C151 = (1001 => 3.5, 4004 => 3.8, 3003 => 3.5); • Copy a hash %CSCI_C151_ = %C151; • Lookup value through key $my_gpa = $C151{4004}; • Update hash value through key $C151{4004} = 4.0; • Insert a new entry (key->value) $C151{2002} = 3.0;

Operations on Hash • keys HASH returns an array with only the keys in the hash. • values HASH returns an array with only the values in the hash, in the same order as the keys returned by keys. %C151 = (1001 => 3.5, 4004 => 3.8, 3003 => 3.5); @k = keys %C151; #@k now: (1001, 4004, 3003) @v = values %C151; #@v now: (3.5, 3.8, 3.5)

Hash operations • Display all pairs of keyvalue foreach $key (keys %hash) { print “$key ---> $hash{$key}\n"; }

Hash operations • Display the all the pairs of keyvalue in ascending order of keys (alphabetically). foreach $key (sort keys %hash) { print “$key ---> $hash{$key}\n"; }

Task 3 (What Will Be Displayed) #!/usr/bin/perl %C151 = (1001=>3.5, 4004=>3.8, 3003=>3.5); print "the GPA of ID(3003) is $C151{3003}\n"; $C151{4004} = 4.0; $C151{2002} = 3.0; foreach $ID (keys %C151) { print "$ID-->$C151{$ID}\n"; } print "\nSort By ID:\n"; foreach $ID (sort (keys %C151)) { print "$ID-->$C151{$ID}\n"; }

Task 3 #!/usr/bin/perl %C151 = (1001=>3.5, 4004=>3.8, 3003=>3.5); print "the GPA of ID(3003) is $C151{3003}\n"; $C151{4004} = 4.0; $C151{2002} = 3.0; foreach $ID (keys %C151) { print "$ID-->$C151{$ID}\n"; } print "\nSort By ID:\n"; foreach $ID (sort (keys %C151)) { print "$ID-->$C151{$ID}\n"; }

Mission Statement of IUSB • “Indiana University South Bend is the comprehensive undergraduate and graduate regional campus of Indiana University that is committed to serving north central Indiana and southwestern Michigan. Its mission is to create, disseminate, preserve, and apply knowledge. The campus is committed to excellence in teaching, learning, research, and creative activity; to strong liberal arts and sciences programs and professional disciplines; to acclaimed programs in the arts and nursing/health professions; and to diversity, civic engagement, and a global perspective. IU South Bend supports student learning, access and success for a diverse residential and non-residential student body that includes under-represented and international students. The campus fosters student-faculty collaboration in research and learning. Committed to the economic development of its region and state, Indiana University South Bend meets the changing educational and research needs of the community and serves as a vibrant cultural resource.” • (http://www.chancellor.iusb.edu/missionstatement.shtml)

Task 4 • Analyze the Mission Statement of IUSB and display • Number of words (ignore case differences) • Number of unique words (ignore case differences) • Count of each unique word in ascending order of words (alphabetically)

Script for Task 4 #!/usr/bin/perl $statement = “….” #replace …. With mission statement print "$statement\n"; $statement =~ tr/A-Z/a-z/; @words = split(/\W+/, $statement); # count the words, %wct is a hashtable foreach $word (@words) { $wct{$word} = $wct{$word} + 1; } @unique_word = keys %wct;

Script for Task 4 (continued) $num_words = $#words + 1; # number of words $uni_words = $#unique_word + 1; # number of unique words print "The number of words is $num_words\n"; print "The number of unique words is $uni_words\n"; print "The occurrence of each words is listed below\n"; print "****************************************\n"; foreach $word (sort (keys %wct)) { print "$word\t$wct{$word}\n"; }

In Previous Script • Keys are sorted alphabetically in ascending order, which is same as: foreach $word (sort {$a cmp $b} (keys %wct)) { print "$word\t$wct{$word}\n"; } • In descending order of words: foreach $word (sort {$b cmp $a} (keys %wct)) { print "$word\t$wct{$word}\n"; }

How to Sort Hash by Values foreach $word (sort {$wct{$a} <=> $wct{$b}} (keys %wct)) { print "$word\t$wct{$word}\n"; } • Note: Switch a and b to change sorting order from ascending to descending

Task 5 • Analyze the Mission Statement of IUSB and display • Number of words (ignore case differences) • Number of unique words (ignore case differences) • Count of each unique words in ascending order of counts

Script for Task 5 #!/usr/bin/perl $statement = “….” #replace …. With mission statement print "$statement\n"; $statement =~ tr/A-Z/a-z/; @words = split(/\W+/, $statement); # count the words, %wct is a hashtable foreach $word (@words) { $wct{$word} = $wct{$word} + 1; } @unique_word = keys %wct;

Script for Task 5 (continued) $num_words = $#words + 1; # number of words $uni_words = $#unique_word + 1; # number of unique words print "The number of words is $num_words\n"; print "The number of unique words is $uni_words\n"; print "The occurrence of each words is listed below\n"; print "****************************************\n"; foreach $word (sort {$wct{$a} <=>$wct{$b}} (keys %wct)) { print "$word\t$wct{$word}\n"; }

How about in Descending Order of Count? • Change the sort function: foreach $word (sort {$wct{$b} <=>$wxt{$a}} (keys %wct)) { print "$word\t$wct{$word}\n"; }

Reading Assignment • Chapter 11

Perl Regular Expressions