210 likes | 419 Views
Advanced Python Concepts: Modules. BCHB524 2013 Lecture 14. Modules. We've already used these from the python standard library and from BioPython.
E N D
Advanced Python Concepts: Modules BCHB5242013Lecture 14 BCHB524 - 2013 - Edwards
Modules • We've already used these from the python standard library and from BioPython import sysfilename = sys.argv[1]file_handle = open(filename)import Bio.SeqIOseq_records = Bio.SeqIO.parse(file_handle, "swiss")import gzipfile_handle = gzip.open(filename)import urllibfile_handle = urllib.urlopen(url)import mathx = math.sqrt(y) BCHB524 - 2013 - Edwards
Modules • Modules contain previously defined functions, variables, and (soon!) classes. • Store useful code for re-use in many programs • Structurerelated codetogether import sysimport MyNucStuffseqfilename = sys.argv[1]seq = MyNucStuff.read_seq_from_filename(seqfilename)cseq = MyNucStuff.complement(seq)rcseq = MyNucStuff.reverseComplement(seq)print" Sequence:",seqprint" C sequence:",cseqprint"RC sequence:",rcseq BCHB524 - 2013 - Edwards
Modules • In MyNucStuff.py defread_seq_from_filename(seq_filename): seq_file = open(seq_filename) dna_seq = ''.join(seq_file.read().split()) dna_seq = dna_seq.upper() seq_file.close()return dna_seqcompl_dict = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}defcomplement(seq):return''.join(map(compl_dict.get,seq))defrevseq(seq):return''.join(reversed(seq))defreverseComplement(seq):return complement(revseq(seq))# plus anything else you like... BCHB524 - 2013 - Edwards
Modules • We can import specific functions directly… • And reference them without the module name import sysfrom MyNucStuff import read_seq_from_filenamefrom MyNucStuff import complementfrom MyNucStuff import reverseComplementseqfilename = sys.argv[1]seq = read_seq_from_filename(seqfilename)cseq = complement(seq)rcseq = reverseComplement(seq)print" Sequence:",seqprint" C sequence:",cseqprint"RC sequence:",rcseq BCHB524 - 2013 - Edwards
Modules • We can even import all the functions from a module… import sysfrom MyNucStuff import * seqfilename = sys.argv[1]seq = read_seq_from_filename(seqfilename)cseq = complement(seq)rcseq = reverseComplement(seq)print" Sequence:",seqprint" C sequence:",cseqprint"RC sequence:",rcseq BCHB524 - 2013 - Edwards
Packages • Packages are collections of modules, grouped together. All equivalent: • Implemented using files and folders/directories. import Bio.SeqIOBio.SeqIO.parse(handle, "swiss")from Bio import SeqIOSeqIO.parse(handle, "swiss")from Bio.SeqIO import parseparse(handle, "swiss") BCHB524 - 2013 - Edwards
What can go wrong? • Sometimes our own .py files can "collide" with Python's packages. • Test what happens with an "empty" module in files: • xml.py (and then try to import ElementTree) • Bio.py (and then try to import SeqIO) • etc… BCHB524 - 2013 - Edwards
A module for codon tables • Module is called: codon_table • Functions: • read_codons_from_filename(filename) • returns dictionary of codons – value is pair: (amino-acid symbol, initiation codon true/false) • amino_acid(codon_table,codon) • returns amino-acid symbol for codon • is_init(codon_table,codon) • returns true if codon is an initiation codon, false, otherwise • get_ambig_aa (codon_table,codon) • Returns the single amino-acid consistent with ambiguous codon (containing N's), or X. • translate(codon_table,seq,frame) • returns amino-acid sequence for DNA sequence seq BCHB524 - 2013 - Edwards
A module for codon tables from MyNucStuff import *from codon_table import *import sysiflen(sys.argv) < 3:print"Require codon table and DNA sequence on command-line." sys.exit(1)table = read_codons_from_filename(sys.argv[1])seq = read_seq_from_filename(sys.argv[2])if is_init(table,seq[:3]):print"Initial codon is an initiation codon"for frame in (1,2,3):print"Frame",frame,"(forward):",translate(table,seq,frame) BCHB524 - 2013 - Edwards
A module for codons • In codon_table.py: defread_codons_from_filename(codonfile):# magicreturn codon_tabledefamino_acid(table,codon):# magicreturn aadefis_init(table,codon):# magicreturn initdefget_ambig_aa(table,codon):# magicreturn aadeftranslate(table,seq,frame):# magicreturn aaseq BCHB524 - 2013 - Edwards
A module for codons • In codon_table.py: defread_codons_from_filename(codonfile): f = open(codonfile) data = {}for l in f: sl = l.split() key = sl[0] value = sl[2] data[key] = value f.close() b1 = data['Base1'] b2 = data['Base2'] b3 = data['Base3'] aa = data['AAs'] st = data['Starts'] codon_table = {} n = len(aa)for i inrange(n): codon = b1[i] + b2[i] + b3[i] isInit = (st[i] == 'M') codon_table[codon] = (aa[i],isInit)return codon_table BCHB524 - 2013 - Edwards
Lab exercises • Rework the lecture, and your solutions (or mine) from the homework exercises #1 through #3, to make a MyDNAStuff module. • Put as many useful nucleotide functions as possible into the module... • Rework the lecture, and your solutions (or mine) from homework exercises #4 and #5, to make the codon_table module with functions specified in this lecture. • Demonstrate the use of these modules to translate an amino-acid sequence in all six-frames with just a few lines of code. • The final result should look similar to Slide 10. • Your program should handle DNA sequence with N’s in it. BCHB524 - 2013 - Edwards
Class Project: Expectations • 40% of your grade! • Project Report • Long version of your homework write-up • Project Presentation • Demo your program • Describe your project solution BCHB524 - 2013 - Edwards
Class Project: Blast Database • Write a program that computes all pairwise blast alignments for two species' proteomes and stores the alignments in a relational database. • Write a program that retrieves the blast alignment for two proteins (specified by their accessions) from the relational database. • Write a program that finds pairs of orthologous proteins that are mutually best hits in the species' proteomes. BCHB524 - 2013 - Edwards
Class Project: MS/MS Viewer • Write a program to display peptide fragmentation spectra from an mzXML file. • The program will take an mzXML file, a scan number, and a peptide sequence as input. • The peptide's b-ion and y-ion m/z values should be computed, and peaks matching these m/z values annotated with appropriate labels. • The output figure/plot should aid the user in determining whether or not the peptide is a good match to the spectrum. BCHB524 - 2013 - Edwards
Class Project: Protein Digest • Write a simple web-server application using TurboGears to carry out anin silicoenzymatic digest of a user-provided protein sequence. • Users should be able to specify min and max length, min and max molecular weight, # of missed cleavages, and specific enzyme. • Output should be a table of peptides, with their length, molecular weight, # of missed cleavages, and amino-acids to left and right of each peptide in the protein sequence. BCHB524 - 2013 - Edwards