1 / 34

An Introduction to Perl with Applications in Web Page Scraping

An Introduction to Perl with Applications in Web Page Scraping. What is Perl?. Practical Extraction and Report Language High Level General purpose Interpreted, dynamic programming language Borrows from Unix shell scripting languages Ideal for “small” tasks which involve text processing.

ekram
Download Presentation

An Introduction to Perl with Applications in Web Page Scraping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Perl with Applications in Web Page Scraping

  2. What is Perl? • Practical Extraction and Report Language • High Level • General purpose • Interpreted, dynamic programming language • Borrows from Unix shell scripting languages • Ideal for “small” tasks which involve text processing

  3. What is going to be taught during this workshop? • Most of this presentation takes from the www.perl.com introduction • Perl language constructs • Variables • Flow control • String processing • File I/O • Subroutines • Object oriented Perl • Application: Web page scraping

  4. Hello World > perl -e 'print "hello world\n"' hello world > perl -e 'print "hello ", "world\n"' hello world > perl -e "print 'hello ', 'world\n'" hello world\n>

  5. Scalars • Single things • Number • String $fruitCount=5; $fruitType='apples'; $countReport = "> There are $fruitCount $fruitType"; print $count_report; > There are 5 apples

  6. Scalars continued $a = "8"; $b = $a + "1"; print “> $b\n”; > 9 $c = $a . "1"; print “> $c\n” > 81

  7. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. Even more scalar examples* $a = 5; $a++; # $a is now 6; we added 1 to it. $a += 10; # Now it's 16; we added 10. $a /= 2; # And divided it by 2, so it's 8.

  8. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. Arrays • Lists of scalars @months = ("July", "August", "September"); print $months[0]; #This prints "July". $months[2] = "Smarch"; • If an array doesn't exist you'll create it when you try to assign a value to one of its elements. $winterMonths[0] = "December"; #This implicitly #creates @winterMonths.

  9. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. Arrays continued • If you want to find the last index of an array, use: print “> $#months\n”; > 2 • If the array is empty or doesn't exist, -1 is returned • You can also resize a list $#months=0 #Now months only contains “July”

  10. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. Hashes • Map a key to a value %daysInMonth = ( "July" => 31, "August" => 31, "September" => 30 ); print “> $daysInMonth{'September'}\n”; > 30 • To add a new key and value, $daysInMonth{"February"} = 28;

  11. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. Hashed continued • Getting the key values print “>” . keys(%daysInMonth) . “\n”; > 3

  12. For loops print “> “; for ($i=0; $i <= 5; $i++)‏ { print “I can count to $i\n”; } print “\n”; > 0 1 2 3 4 5

  13. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. For loops • Iterating over a list print “> “; for $i (5, 4, 3, 2, 1) { print "$i "; } print “\n”; > 5 4 3 2 1

  14. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. For loops continued @one_to_ten = (1 .. 10); $top_limit = 25; for $i (@one_to_ten, 15, 20 .. $top_limit) { print "$i\n"; }

  15. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. One more for loop for $marx ('Groucho', 'Harpo', 'Zeppo', 'Karl') { print "> $marx is my favorite Marx brother.\n"; } > Groucho is my favorite Marx brother. > Harpo is my favorite Marx brother. > Zeppo is my favorite Marx brother. > Karl is my favorite Marx brother.

  16. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. While loop my $count = 0; print “> “; while ($count != 3) { $count++; print "$count "; } print “\n”; > 1 2 3

  17. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. Until loop $count=3; print “> “; until ($count == 0) { $count--; print "$count "; } print “\n”; > 2 1 0

  18. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. if/elsif/else if ($a == 5) { print "It's five!\n"; } elsif ($a == 6) { print "It's six!\n"; } else { print "It's something else.\n"; }

  19. *Shameless taken from http://www.perl.com/pub/a/2000/10/begperl1.html. Unless unless ($pie eq 'apple') { print "Ew, I don't like $pie flavored pie.\n"; } else { print "Apple! My favorite!\n"; }

  20. Comparing unless and if print "I'm burning the 7 pm oil\n" unless $day eq 'Friday'; print “I'm burning the 7pm oil\n” if not ($day eq 'Friday');

  21. String operations $yes_no = 'no'; print “> affirmative\n” if $yes_no == 'yes'; > affirmative • Strings are automatically converted to numbers for operations like '==' • Use eq instead of == for this to work correctly

  22. More string comparisons my $five = 5; print "> Numeric equality!\n" if $five == " 5 "; print "> String equality!\n" if $five eq "5"; > Numeric equality > String equality print "> No string equality!\n" if not($five eq " 5"); > No string equality

  23. substr $greeting = "Welcome to Perl!\n"; print “> “.substr($greeting, 0, 7).”\n”; > Welcome print “> “, substr($greeting, 7) ”\n”; > to Perl! print “> “, substr($greeting, -6, 6), “>”; > Perl! >

  24. substr continued my $greeting = "Welcome to Java!\n"; substr($greeting, 11, 4) = 'Perl'; # $greeting is now "Welcome to Perl!\n"; substr($greeting, 7, 3) = ''; # ... "Welcome Perl!\n"; substr($greeting, 0, 0) = 'Hello. '; # ... "Hello. Welcome Perl!\n";

  25. split my $greeting = "Hello. Welcome Perl!\n"; my @words = split(/ /, $greeting); # Three items: "Hello.", "Welcome", "Perl!\n" my $greeting = "Hello. Welcome Perl!\n"; my @words = split(/ /, $greeting, 2); # Two items: "Hello.", "Welcome Perl!\n";

  26. join my @words = ("Hello.", "Welcome", "Perl!\n"); my $greeting = join(' ', @words); # "Hello. Welcome Perl!\n"; my $andy_greeting = join(' and ', @words); # "Hello. and Welcome and Perl!\n"; my $jam_greeting = join('', @words); # "Hello.WelcomePerl!\n";

  27. Reading from a file This is a test test.txt

  28. Reading from a file continued open my $testfile, 'test.txt' or die "I couldn't get at log.txt: $!"; while ($line=<$logfile>){ print “> “, $line; } > This > is > a > test

  29. chomp open my $testfile, 'test.txt' or die "I couldn't get at log.txt: $!"; print “> “; while (chomp($line=<$logfile>)){ print “$line “; } print “\n”; > This is a test

  30. Writing to a file open my $overwrite, '>', 'overwrite.txt' or die "error trying to overwrite: $!"; # Wave goodbye to the original contents. open my $append, '>>', 'append.txt' or die "error trying to append: $!"; # Original contents still there; add to the end of the file

  31. Subroutines sub multiply{ my (@ops) = @_; my $ret = 1; for $val (@ops) { $ret *= $val; } return $ret; } print "> ",multiply(2 .. 5), "\n"; > 120

  32. Programming with objects An objects is a programmer defined data structure which encapsulates • Data • Behavior (methods)‏ • A web browser object may have • Data • The current page • A history of recently visited URL • Behavior • Can navigate to a page • Can display a page

  33. An Application: Scraping Web Pages

  34. References • Beginners introduction to Perl http://www.perl.com/pub/a/2000/10/begperl1.html • Perl Mechanize Library Documentation http://search.cpan.org/dist/WWW-Mechanize/ • Schwartz, R.L and Phoeniz, T., Lerning Perl, 3rd Edition, November 1993.

More Related