1 / 23

Programming and Perl for Bioinformatics Part II

Programming and Perl for Bioinformatics Part II. Basic Data Types. Perl has three basic data types : scalar array (list) associative array (hash). Extract 2 nd item from @names. Extract the sublist from @names. Arrays. An array (list) is an ordered list of scalar values.

chavi
Download Presentation

Programming and Perl for Bioinformatics Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming and Perlfor BioinformaticsPart II

  2. Basic Data Types • Perl has three basic data types: • scalar • array (list) • associative array (hash)

  3. Extract 2nd item from @names Extract the sublist from @names Arrays • An array (list) is an ordered list of scalar values. • ‘@’ is used to refer to the entire array • Example: • (1,2,3) # Array of three values 1, 2, and 3 • ("one","two","three") # Array of 3 values "one", "two", "three" • @names = ("mary", "tom", "mark", "john", "jane"); • $names [1] ; ? • @names [1..4]; # “tom”

  4. More on Arrays • @a = ( ); # empty list • @b = (1,2,3); # three numbers • @c = ("Jan","Joe","Marie"); # three strings • @d = ("Dirk",1.92,46,"20-03-1977"); # a mixed list • Variables and sublists are interpolated in a list • @b = ($a, $a+1, $a+2); # variable interpolation • @c = ("Jan", ("Joe","Marie") ); # list interpolation • @d = ("Dirk", 1.92,46,( ), "20-03-1977"); # empty list interpolation • @e = ( @b, @c ); # same as (1,2,3,"Jan","Joe","Marie") • Practical construction operators ($x..$y) • @x = (1..6) # same as (1, 2, 3, 4, 5, 6) • @y = (2..5, 8, 11..13) # same as (2,3,4,5,8,11,12,13)

  5. Array Example • # Here's one way to declare an array, initialized with a list of four # scalar values. @bases = ('A', 'C', 'G', 'T'); • # Now we'll print each element of the array print "Here are the array elements:"; print "\nFirst element: "; print $bases[0]; print "\nSecond element: "; print $bases[1]; print "\nThird element: "; print $bases[2]; print "\nFourth element: "; • This code snippet prints out: Here are the array elements: • First element: A • Second element: C • Third element: G • Fourth element: T

  6. Print Array • You can print the elements one after another like this: @bases = ('A', 'C', 'G', 'T'); print "\n\nHere are the array elements: "; print @bases; • It produces the output: • Here are the array elements: ACGT

  7. Converting a string to an array split splits a variable into parts and puts them in an array. $dnastring = "ACGTGCTA"; @dnaarray =split ( //, $dnastring ) ; #@dnaarray is now (A, C, G, T, G, C, T, A) @dnaarray =split ( /T/, $dnastring ) ; #@dnaarray is now (ACG, GC, A)

  8. Converting an array to a string • joincombines the elements of an array into a single scalar variable (a string) $dnastring = join('', @dnaarray); spacer (empty here) which array

  9. Array Manipulations reverse Reverses the order of array elements @a = (1, 2, 3); @b = reverse @a; # @b = (3, 2, 1); split Splits a string into a list/array $line = "John Smith 28"; ($first, $last, $age) = split (/\s/, $line); #\s: white spaces [\t\n\f\r] $DNA = "ACGTTTGA"; @DNA = split ("", $DNA); join Joins a list/array into a string $gene = join ( "", ($exon1, $exon3) ) ; $name = join ( "-", ("Zhong", "Hui")) ; scalar Returns the number of elements in @array scalar @array;

  10. Array Manipulations - pop • You can take an element off the end of an array with pop: @bases = ('A', 'C', 'G', 'T'); $base1 = pop @bases; print "Here's the element removed from the end: "; print $base1, "\n\n"; print "Here's the remaining array of bases: "; print "@bases"; • which produces the output: Here's the element removed from the end: T Here's the remaining array of bases: A C G

  11. Array Manipulations - shift • You can take a base off of the beginning of the array with shift: @bases = ('A', 'C', 'G', 'T'); $base2 = shift @bases; # shift left print "Here's an element removed from the beginning: "; print $base2, "\n\n"; print "Here's the remaining array of bases: "; print "@bases"; • which produces the output: Here's an element removed from the beginning: A Here's the remaining array of bases: C G T

  12. Array Manipulations - push • You can put an element on the end of the array with push: @bases = ('A', 'C', 'G', 'T'); $base2 = shift @bases; push (@bases, $base2);# return the number of elements in the array after push print "Here's the element from the beginning put on the end: "; print "@bases\n\n"; • It produces the output: Here's the element from the beginning put on the end: C G T A

  13. Array Manipulations - unshift • You can put an element at the beginning of the array with unshift: @bases = ('A', 'C', 'G', 'T'); $base1 = pop @bases; unshift (@bases, $base1); print "Here's the element from the end put on the beginning:"; print "@bases\n\n"; • It produces the output: Here's the element from the end put on the beginning: T A C G

  14. Exercise #Determine freq of nucleotides $dna ="gaTtACataCACTgttca"; ?

  15. Filehandles File I/O (input/output): reading from/writing to files • Files represented in Perl by a filehandle variable (for clarity, written as a bare word in UPPERCASE) • Open a file on a filehandle using the open function • for reading (input): open INFILE, “<datafile.txt”; or open (INFILE, “<datafile.txt”); • for writing (output), overwriting the file: open OUTFILE, “>output”; • for appending to the end of the file: open OUTFILE, “>>output”; • Close a file on a filehandle • Close (OUTFILE);

  16. Special Filehandles Special “files” that are always “open” • STDIN (standard input) • input from command window read only • STDOUT (standard output) • output to command window write only print STDOUT “Have fun with Perl!\n”; or just print “Have fun with Perl!\n”;

  17. Input from Filehandles “Angle Bracket” input operator • reads one line of input (up to newline/carriage return) • from STDIN: print "Enter name of protein: "; $line = <STDIN>; chomp $line;# removes \n from end of $line print “\nYou entered $line.\n”; • from a file: open ( INPUTFILE, “prot1.seq”); $line1 = <INPUTFILE>; # first line chomp $line1; $line2 = <INPUTFILE>; # second line # Perl reads files one line at a time # … etc

  18. sequences.fasta >gi|145536|gb|L04574.1|Escherichia coli DNA polymerase III chi subunit gene, complete cds TAACGGCGAAGAGTAATTGCGTCAGGCAAGGCTGTTATTGCCGGATGCGGCGTGAACGCCTTATCCGACC TACACAGCACTGAACTCGTAGGCCTGATAAGACACAACAGCGTCGCATCAGGCGCTGCGGTGTATACCTG ATGCGTATTTAAATCCACCACAAGAAGCCCCATTTATGAAAAACGCGACGTTCTACCTTCTGGACAATGA CACCACCGTCGATGGCTTAAGCGCCGTTGAGCAACTGGTGTGTGAAATTGCCGCAGAACGTTGGCGCAGC GGTAAGCGCGTGCTCATCGCCTGTGAAGATGAAAAGCAGGCTTACCGGCTGGATGAAGCCCTGTGGGCGC GTCCGGCAGAAAGCTTTGTTCCGCATAATTTAGCGGGAGAAGGACCGCGCGGCGGTGCACCGGTGGAGAT CGCCTGGCCGCAAAAGCGTAGCAGCAGCCGGCGCGATATATTGATTAGTCTGCGAACAAGCTTTGCAGAT TTTGCCACCGCTTTCACAGAAGTGGTAGACTTCGTTCCTTATGAAGATTCTCTGAAACAACTGGCGCGCG AACGCTATAAAGCCTACCGCGTGGCTGGTTTCAACCTGAATACGGCAACCTGGAAATAATGGAAAAGACA TATAACCCACAAGATATCGAACAGCCGCTTTACGAGCACTGGGAAAAGCAGGGCTACTTTAAGCCTAATG GCGATGAAAGCCAGGAAAGTTTCTGCATCATGATCCCGCCGCCGAA

  19. Determine frequency of nucleotides • Input file: sequences.fasta open (INPUTFILE, "sequences.fasta"); #open file for sequence $line1 = <INPUTFILE>; $line2 = <INPUTFILE>; $line3 = <INPUTFILE>; chomp ($line2, $line3); $dna = $line2.$line3; $count_A = 0; $count_C = 0; $count_G = 0; $count_T = 0; @dna = split '', $dna; foreach $base (@dna) { if ($base eq 'A') {$count_A++;} elsif ($base eq 'C') {$count_C++;} elsif ($base eq 'G') {$count_G++;} elsif ($base eq 'T') {$count_T++;} else {print "error!\n";} } print "count of A = $count_A \n"; print "count of C = $count_C \n"; print "count of G = $count_G \n"; print "count of T = $count_T \n";

  20. Read a File: line by line my $my_sequence; open FILE1, “/u/doej01/prot1.seq”; while ($line = <FILE1>){ chomp($line); $my_sequence=$my_sequence.$line; }; close ( FILE1 ); • Dumps the whole file into the variable : my_sequence

  21. Using loops to read in a file • The whileloop just keeps doing an expression while it’s true. So it will keep reading lines from the file until it runs out. • The special variable $_ keeps track of the line of the file we’re on. my $longsequence; open FILE, ‘exampleprotein.txt’; while (<FILE>){ $longsequence = $longsequence . $_ ; chomp $longsequence; } close FILE; • This reads the whole file, and puts each line into the variable $longsequenceone at a time.

  22. Read a File into an Array • Rather than read a file one line at time into a scalar variable, it is often helpful to read the entire file into an array open FILE1, “prot1.seq”; @DNA = <FILE1>; #array of strings

  23. Writing to a File • Writing to a file is similar to reading from it • Use the > operator to open a file for writing: open OUTPUT,‘>/home/achou/output.txt’; • This creates a new file with that name, or overwrites an existing file • Use >> to append text to an existing file • print to the file using the filehandle: print OUTPUT $myoutputdata;

More Related