380 likes | 462 Views
Perl. Part I: A Biology Primer. Conceptual Biology. H. sapiens did not create the genetic code – but they did invent the transistor Biological life is not optimized – the modern synthesis Nature vs. Nurture
E N D
Perl Part I: A Biology Primer
Conceptual Biology • H. sapiens did not create the genetic code – but they did invent the transistor • Biological life is not optimized – the modern synthesis • Nature vs. Nurture • What are the best ways to understand the important differences the make the difference?
A Molecular Primer • Hierarchy of the eukaryote • Organism > System > Organ > Tissue > Cell > Organelle > Protein > RNA > DNA • Put Simply: DNA → RNA → Protein
The Building Blocks • DNA is composed of four building blocks • Nucleic acids, nucleotides, bases • Adenine, Cytosine, Guanine, Thymine • RNA also has four building blocks • Adenine, Cytosine, Guanine, Uracil • Proteins are composed of 20 building blocks • Amino acids, residues • Fragments of proteins are called peptides • DNA, RNA and Proteins are polymers
One Dimensional • Two Dimensional • Three Dimensional
Met (Start) Leu AA?, AU?, CA?, CU? -> Asn, Lys, Ile, Met, His, Gln, Val Pro UU?, UG?, UC?, CU?, CG?, CC? -> Phe, Leu, Cys, Stop, Trp, Ser, Leu, Arg, Pro UCU, UGU, GCU, GGU -> Ser, Cys, Ala, Gly
Cys Phe, Leu A?C, U?C -> Ile, Thr, Asn, Ser, Phe, Ser, Tyr, Cys Leu U?U, U?G, C?U, C?G -> Phe, Ser, Tyr, Cys, Leu, Stop, Trp, Leu, Pro, His, Arg, Gln GUU, CUU -> Val, Leu
Lecture II Part II: One-Dimensional Strings
Hello World… • A few perls of wisdom • Concatenating Sequences • Making a reverse complement • Read sequences from data files
Every journey starts with a first 10bp #!/usr/bin/perl –w #storing DNA in a variable, and printing it out #First, storing DNA in a variable called $DNA $DNA = ‘CGGGCTATTC’; #Next, print the DNA onto the screen print $DNA; #Finally, specifically tell the program to end exit;
Every journey starts with a first 10bp #!/usr/bin/perl–w #storing DNA in a variable, and printing it out #First, storing DNA in a variable called $DNA $DNA = ‘CGGGCTATTC’; #Next, print the DNA onto the screen print $DNA; #Finally, specifically tell the program to end exit;
Every journey starts with a first 10bp #!/usr/bin/perl–w #storing DNA in a variable, and printing it out #First, storing DNA in a variable called $DNA $DNA = ‘CGGGCTATTC’; #Next, print the DNA onto the screen print $DNA; #Finally, specifically tell the program to end exit;
Every journey starts with a first 10bp #!/usr/bin/perl –w #storing DNA in a variable, and printing it out #First, storing DNA in a variable called $DNA $DNA = ‘CGGGCTATTC’; #Next, print the DNA onto the screen print $DNA; #Finally, specifically tell the program to end exit;
Concatenating DNA Fragments #!/usr/bin/perl –w #Store DNA in 2 variables $DNA1 = ‘AGTGCGTCGCTAG’; $DNA2 = ‘ACCGCATGCATTG’; #using string interpolation $DNA3 = “$DNA1$DNA2”; print “$DNA3\n\n”; #dot operator $DNA3 = $DNA1 . $DNA2; print “$DNA3\n\n”; Print $DNA1,$DNA2,”\n”; exit;
Transcription: DNA to RNA #!/usr/bin/perl –w $DNA = ‘ACGACTGCACGATCGTACG’; #print the DNA onto the screen print “$DNA\n\n”; #Transcribe the DNA->RNA by substituting all T’s with U’s $RNA = $DNA; $RNA =~ s/T/U/g; #print the result to the screen print “Here is the result of DNA->RNA:\t$RNA\n\n”; exit;
Variable Binding Operator Delimiters to separate the operator $RNA =~ s/T/U/g; Substitute operator Pattern modifier g = globally i = case insensititve m = multiline s = single line x = permit comments o = compile only once for speed e = treat replacement as Perl code Pattern to be replaced Replacement Text of replace pattern
Calculating the Reverse Complement #!usr/bin/perl –w $DNA = ‘ACGTCAGTCGAGCT’; #print the starting DNA onto the screen print “Here is the starting DNA:\t$DNA\n\n”; #Calculate the reverse complement, first copying the DNA onto #a new variable called $revcom $revcom = reverse $DNA; #substitute all bases by their complement $revcom =~ s/A/T/g; $revcom =~ s/T/A/g; $revcom =~ s/C/G/g; $revcom =~ s/G/C/g; print “$revcom\n”;
Calculating the Reverse Complement #!usr/bin/perl –w $DNA = ‘ACGTCAGTCGAGCT’; #print the starting DNA onto the screen print “Here is the starting DNA:\t$DNA\n\n”; #Calculate the reverse complement, first copying the DNA onto #a new variable called $revcom $revcom = reverse $DNA; #substitute all bases by their complement $revcom =~ tr/ACGTacgt/TGCAtgca/; print “$revcom\n”;
Reading Data from Files #### Sample Data in FASTA Format #### >NM_012345 | Sample Data | Muppet Stuffing Protein MNIDDKLEFGDEMGOSSRTMV FGDLVRSMPHOEILAADEVLISHEE GLOYAKLEFGDEMGOGHDDEFGVY
Reading Files #!/usr/bin/perl –w #The filename of the file containing the sequence data $proteinFilename = ‘NM_012345.pep’; #open the file, and associate a ‘filehandle’ with it open (PROTEINFILE {IN}, $proteinFilename); #assign file with an input operator $muppetProtein = <PROTEINFILE>; #print the protein file print “Here is the protein:\t$muppetProtein\n\n”; exit;
Reading Data from Files #### Sample Data in FASTA Format #### >NM_012345 | Sample Data | Muppet Stuffing Protein MNIDDKLEFGDEMGOSSRTMV FGDLVRSMPHOEILAADEVLISHEE GLOYAKLEFGDEMGOGHDDEFGVY
Lets try this again … #!usr/bin/perl –w $proteinFilename = ‘NM_012345.pep’; open(PROTEINFILE, $proteinFilename); $muppetProtein = <PROTEINFILE>; print “Here is the first line:\t$muppetProtein\n\n”; $muppetProtein = <PROTEINFILE>; print “Here is the second line:\t$muppetProtein\n\n”; $muppetProtein = <PROTEINFILE>; print “Here is the third line:\t$muppetProtein\n\n”; close PROTEINFILE; exit;
Using Arrays to Read Files #!usr/bin/perl –w $proteinFilename = ‘NM_012345’; #open the file open(PROTEINFILE, $proteinFilename); #Read the sequence data from the file, and store it in the array #variable @protein @protein = <PROTEINFILE>; #print the protein onto the screen print @protein; close PROTEINFILE; exit;
Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Now print each element of the array print “\nFirst element: “ , $bases[0]; print “\nSecond Element: “ , $bases[1]; print “\nThird Element: “ , $bases[2]; print “\nFourth Element: “ , $bases[3];
Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Now print each element of the array in a row print “\nHere are all of the bases: “ , @bases; #This prints out: ‘Here are all of the bases: ACGT’ #But, you can print them out with spaces in between print “\nHere they are with spaces” , “@bases”;
Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how to take an element off of the end $base1 = pop @bases; print “Here’s the last element: “, $base1, “\n\n”; #The other elements still remain print “\nHere are the remaining elements: ” , “@bases”;
Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how to take an element off of the front $base2 = shift @bases; print “Here’s the first element: “, $base2, “\n\n”; #The other elements still remain print “\nHere are the remaining elements: ” , “@bases”;
Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how you put an element at the beginning of an array #Our example will put the last element at the beginning $base1 = pop @bases; unshift (@bases, $base1); print “Here’s the last element put first: “ , “@bases\n\n”;
Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how you put an element at the end of an array #Our example will put the first element at the end $base1 = shift @bases; push (@bases, $base1); print “Here’s the first element put last: “ , “@bases\n\n”;
Arrays #Here’s one way to declare an array @bases = (‘A’,’C’,’G’,’T’); #Here’s how to reverse an array @reverse = reverse @bases; #Here’s how to get the length print scaler @bases, “\n\n”; #Here’s how to insert an element at an arbitrary place splice (@bases, 2, 0, ‘X’);
Arrays #Arrays can be evaluated as lists and scalers @bases = (‘A’,’C’,’G’,’T’); #Here’s how to print the array print “@bases\n”; #Here’s how to assign it to a scaler $a = @bases; print $a; #Here’s how to assign an array to a list ($a) = @bases; print $a;