1.17k likes | 1.43k Views
BioPerl. An Introduction to Perl – by Seung-Yeop Lee XS extension – by Sen Zhang BioPerl Introduction– by Hairong Zhao BioPerl Script Examples – by Tiequan Zhang. Part I. An Introduction to Perl. by Seung-Yeop Lee. What is Perl?.
E N D
BioPerl • An Introduction to Perl – by Seung-Yeop Lee • XS extension – by Sen Zhang • BioPerl Introduction– by Hairong Zhao • BioPerl Script Examples – by Tiequan Zhang
Part I. An Introduction to Perl by Seung-Yeop Lee
What is Perl? • Perl is an interpreted programming language that resembles both a real programming language and a shell. • A Language for easily manipulating text, files, and processes • Provides more concise and readable way to do jobs formerly accomplished using C or shells. • Perl stands for Practical Extraction and Report Language. • Author: Larry Wall (1986)
Why use Perl? • Easy to use • Basic syntax is C-like • Type-”friendly” (no need for explicit casting) • Lazy memory management • A small amount of code goes a long way • Fast • Perl has numerous built-in optimization features which makes it run faster than other scripting language. • Portability • One script version runs everywhere (unmodified).
Why use Perl? • Efficiency • For programs that perform the same task (C and Perl), even a skilled C programmer would have to work harder to write code that: • Runs as fast as Perl code • Is represented by fewer lines of code • Correctness • Perl fully parses and pre-”compiles” script before execution. • Efficiently eliminates the potential for runtime SYNTAX errors. • Free to use • Comes with source code
interpreter path ‘#’ denotes a line commment Newline character Delimits a string Terminator character Function which outputs arguments. Hello, world! #!/usr/local/bin/perl # print “Hello, world \n”;
Basic Program Flow • No “main” function • Statements executed from start to end of file. • Execution continues until • End of file is reached. • exit(int) is called. • Fatal error occurs.
Variables • Data of any type may be stored within three basic types of variables: • Scalar • List • Associative array (hash table) • Variables are always preceded by a “dereferencing symbol”. • $ - Scalar variables • @ - List variables • % - Associative array variables
Variables • Notice that we did NOT have to • Declare the variable before using it • Define the variable’s data type • Allocate memory for new data values
Scalar variables • References to variables always being with “$” in both assignments and accesses: • For scalars: • $x = 1; • $x = “Hello World!”; • $x = $y; • For scalar arrays: • $a[1] = 0; • $a[1] = $b[1];
List variables • Lists are prefaced by an “@” symbol: @count = (1, 2, 3, 4, 5); @count = (“apple”, “bat”, “cat”); @count2 = @count; • A list is simply an array of scalar values. • Integer indexes can be used to reference elements of a list. • To print an element of an array, do: print $count[2];
Associative Array variables • Associative array variables are denoted by the % dereferencing symbol. • Associative array variables are simply hash tables containing scalar values • Example: $fred{“a”} = “aaa”; $fred{“b”} = “bbb”; $fred{6} = “cc”; $fred{1} = 2; • To do this in one step: %fred = (“a”, “aaa”, “b”, “bbb”, 6, “cc”, 1, 2);
Statements & Input/Output • Statements • Contains all the usual if, for, while, and more… • Input/Output • Any variable not starting with “$”, “@” or “%” is assumed to be a filehandle. • There are several predefined filehandles, including STDIN, STDOUT and STDERR.
Subroutines • We can reuse a segment of Perl code by placing it within a subroutine. • The subroutine is defined using the sub keyword and a name. • The subroutine body is defined by placing code statements within the {} code block symbols. sub MySubroutine { #Perl code goes here. }
Subroutine call • To call a subroutine, prepend the name with the & symbol: &MySubroutine; • Subroutine may be recursive (call themselves).
Pattern Matching • Perl enables to compare a regular expression pattern against a target string to test for a possible match. • The outcome of the test is a boolean result (TRUE or FALSE). • The basic syntax of a pattern match is $myScalar =~ /PATTERN/ • “Does $myScalar contain PATTERN ?”
Functions • Perl provides a rich set of built-in functions to help you perform common tasks. • Several categories of useful built-in function include • Arithmetic functions (sqrt, sin, … ) • List functions (push, chop, … ) • String functions (length, substr, … ) • Existance functions (defined, undef)
Perl 5 • Introduce new features: • A new data type: the reference • A new localization: the my keyword • Tools to allow object oriented programming in Perl • New shortcuts like “qw” and “=>” • An object oriented based liberary system focused around “Modules”
Variable Reference Value References • A reference is a scalar value which “points to” any variable.
Creating References • References to variables are created by using the backslash(\) operator. $name = “bio perl”; $reference = \$name; $array_reference = \@array_name; $hash_reference = \%hash_name; $subroutine_ref = \&sub_name;
Dereferencing a Reference • Use an extra $ and @ for scalars and arrays, and -> for hashes. print “$$scalar_reference\n” “@$array_reference\n” “$hash_reference->{‘name’}\n”;
a is 2 Variable Localization • local keyword is used to limit the scope of a variable to within its enclosing brackets. • Visible not only from within the enclosing bracket but in all subroutine called within those brackets $a = 1; sub mySub { local $a = 2; &mySub1($a); } sub mySub1 { print “a is $a\n”; }
a is 1 Variable Localization – cont’d • my keyword hides the variable from the outside world completely. • Totally hidden $a = 1; sub mySub { my $a = 2; &mySub1($a); } sub mySub1 { print “a is $a\n”; }
Object Oriented Programming in Perl (1) • Defining a class • A class is simply a package with subroutines that function as methods. #!/usr/local/bin/perl package Cat; sub new { … } sub meow { … }
Object Oriented Programming in Perl (2) • Perl Object • To initiates an object from a class, call the class “new” method. $new_object = new ClassName; • Using Method • To use the methods of an object, use the “->” operator. $cat->meow();
Object Oriented Programming in Perl (3) • Inheritance • Declare a class array called @ISA. • This array store the name and parent class(es) of the new species. package NorthAmericanCat; @NorthAmericanCat::ISA = (“Cat”); sub new { … }
@name = qw(Tom Mary Michael); Miscellaneous Constructs • qw • The “qw” keyword is used to bypass the quote and comma character in list array definitions. @name = (“Tom”, “Mary”, “Michael”);
%client = {“name” => “Michael”, “phone” => ”123-3456”, “email” => “mich@nj.net”}; Miscellaneous Constructs • => • The => operator is used to make hash definitions more readable. %client = {“name”, , “Michael”, “phone” , ”123-3456”, “email” , ”mich@nj.net”};
Perl Modules • A Perl module is a reusable package defined in a library file whose name is the same as the name of the package. • Similar to C link library or C++ class package Foo; sub bar { print “Hello $_[0]\n”} sub blat { print “World $_[0]\n”: 1;
Names • Each Perl module has a unique name. • To minimize name space collision, Perl provides a hierarchical name space for modules. • Components of a module name are separated by double colons (::). • For example, • Math::Complex • Math::Approx • String::BitCount • String::Approx
Module files • Each module is contained in a single file. • Module files are stored in a subdirectory hierarchy that parallels the module name hierarchy. • All module files have an extension of .pm.
Module libraries • The Perl interpreter has a list of directories in which it searhces for modules. • Global arry @INC >perl –V @INC: /usr/local/lib/perl5/5.00503/sun4-solaris /usr/local/lib/perl5/5.00503 /usr/local/lib/perl5/site-perl/5.005/sun4-solaris /usr/local/lib/perl5/site-perl/5.005
Creating Modules • To create a new Perl module: ../development>h2xs –X –n Foo::Bar Writing Foo/Bar/Bar.pm Writing Foo/Bar/Makefile.PL Writing Foo/Bar/test.pl Writing Foo/Bar/Changes Writing Foo/Bar/MANIFEST ../development>
Create the makefile Create test directory blib and the installs the module in it. Run test.pl Install your module Building Modules • To build a Perl module: perl Makefile.PL make make test make install
Using Modules • A module can be loaded by calling the use function. use Foo; bar( “a” ); blat( “b” ); • Calls the eval function to process the code. • The 1; causes eval to evaluate to TRUE.
End of Part I. Thank You…
Part II:XS(eXternal subroutine)extension • Sen Zhang
XS • XS is an acronym for eXternal Subroutine. • With XS, we can call C subroutines directly from Perl code, as if they were Perl subroutines.
Perl is not good at: • very CPU-intensive things, like numerical integration . • very memory-intensive things. Perl programs that create more than 10,000 hashes run slowly. • system software, like device drivers. • things that have already been written in other languages.
Usually… • These things are done by other highly efficient system programming languages such as C\C++.
Can we call C subroutine from Perl? • Solution is: Perl C API
When perl talks with C subroutine using perl C API • two things must happen: • control flow - control must pass from Perl to C (and back) • C program execution • Perl program execution • data flow - data must pass from Perl to C (and back) • C data representation • Perl data representation
In order to use perl C API • What is Perl's internal data structures. • How the Perl stack works, and how a C subroutine gets access to it. • How C subroutines get linked into the Perl executable. • Understand the data paths through the DynaLoader module that associate the name of a Perl subroutine with the entry point of a C subroutine
If you do code directly to the Perl C API • You will find You keep writing the same little bits of code • to move parameters on and off the Perl stack; • to convert data from Perl's internal representation to C variables; • to check for null pointers and other Bad Things. • When you make a mistake, you don't get bad output: you crash the interpreter. • It is difficult, error-prone, tedious, and repetitive.
Pain killer is • XS
What is XS? • Narrowly, XS is the name of the glue language • More broadly, XS comprises a system of programs and facilities that work together : • MakeMaker, • Xsub glue routine, • XS language itself, • xsubpp, • h2xs, • DynaLoader.
MakeMaker -tool • Perl's MakeMaker facility can be used to provide a Makefile to easily install your Perl modules and scripts.
MakeMaker, • Xsub glue routine, • XS language itself, • xsubpp, • h2xs, • DynaLoader.
MakeMaker, • Xsub glue routine, • XS language itself, • xsubpp, • h2xs, • DynaLoader.
Xsub • The Perl interpreter calls a kind of glue routine as an xsub. • Rather than drag the Perl C API into all our C code, we usually write glue routines. (We'll refer to an existing C subroutine as a target routine.)