Advanced Python Concepts: Modules

Advanced Python Concepts: Modules BCHB5242013Lecture 14 BCHB524 - 2013 - Edwards

Modules • We've already used these from the python standard library and from BioPython import sysfilename = sys.argv[1]file_handle = open(filename)import Bio.SeqIOseq_records = Bio.SeqIO.parse(file_handle, "swiss")import gzipfile_handle = gzip.open(filename)import urllibfile_handle = urllib.urlopen(url)import mathx = math.sqrt(y) BCHB524 - 2013 - Edwards

Modules • Modules contain previously defined functions, variables, and (soon!) classes. • Store useful code for re-use in many programs • Structurerelated codetogether import sysimport MyNucStuffseqfilename = sys.argv[1]seq = MyNucStuff.read_seq_from_filename(seqfilename)cseq = MyNucStuff.complement(seq)rcseq = MyNucStuff.reverseComplement(seq)print" Sequence:",seqprint" C sequence:",cseqprint"RC sequence:",rcseq BCHB524 - 2013 - Edwards

Modules • In MyNucStuff.py defread_seq_from_filename(seq_filename): seq_file = open(seq_filename) dna_seq = ''.join(seq_file.read().split()) dna_seq = dna_seq.upper() seq_file.close()return dna_seqcompl_dict = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}defcomplement(seq):return''.join(map(compl_dict.get,seq))defrevseq(seq):return''.join(reversed(seq))defreverseComplement(seq):return complement(revseq(seq))# plus anything else you like... BCHB524 - 2013 - Edwards

Modules • We can import specific functions directly… • And reference them without the module name import sysfrom MyNucStuff import read_seq_from_filenamefrom MyNucStuff import complementfrom MyNucStuff import reverseComplementseqfilename = sys.argv[1]seq = read_seq_from_filename(seqfilename)cseq = complement(seq)rcseq = reverseComplement(seq)print" Sequence:",seqprint" C sequence:",cseqprint"RC sequence:",rcseq BCHB524 - 2013 - Edwards

Modules • We can even import all the functions from a module… import sysfrom MyNucStuff import * seqfilename = sys.argv[1]seq = read_seq_from_filename(seqfilename)cseq = complement(seq)rcseq = reverseComplement(seq)print" Sequence:",seqprint" C sequence:",cseqprint"RC sequence:",rcseq BCHB524 - 2013 - Edwards

Packages • Packages are collections of modules, grouped together. All equivalent: • Implemented using files and folders/directories. import Bio.SeqIOBio.SeqIO.parse(handle, "swiss")from Bio import SeqIOSeqIO.parse(handle, "swiss")from Bio.SeqIO import parseparse(handle, "swiss") BCHB524 - 2013 - Edwards

What can go wrong? • Sometimes our own .py files can "collide" with Python's packages. • Test what happens with an "empty" module in files: • xml.py (and then try to import ElementTree) • Bio.py (and then try to import SeqIO) • etc… BCHB524 - 2013 - Edwards

A module for codon tables • Module is called: codon_table • Functions: • read_codons_from_filename(filename) • returns dictionary of codons – value is pair: (amino-acid symbol, initiation codon true/false) • amino_acid(codon_table,codon) • returns amino-acid symbol for codon • is_init(codon_table,codon) • returns true if codon is an initiation codon, false, otherwise • get_ambig_aa (codon_table,codon) • Returns the single amino-acid consistent with ambiguous codon (containing N's), or X. • translate(codon_table,seq,frame) • returns amino-acid sequence for DNA sequence seq BCHB524 - 2013 - Edwards

A module for codon tables from MyNucStuff import *from codon_table import *import sysiflen(sys.argv) < 3:print"Require codon table and DNA sequence on command-line." sys.exit(1)table = read_codons_from_filename(sys.argv[1])seq = read_seq_from_filename(sys.argv[2])if is_init(table,seq[:3]):print"Initial codon is an initiation codon"for frame in (1,2,3):print"Frame",frame,"(forward):",translate(table,seq,frame) BCHB524 - 2013 - Edwards

A module for codons • In codon_table.py: defread_codons_from_filename(codonfile):# magicreturn codon_tabledefamino_acid(table,codon):# magicreturn aadefis_init(table,codon):# magicreturn initdefget_ambig_aa(table,codon):# magicreturn aadeftranslate(table,seq,frame):# magicreturn aaseq BCHB524 - 2013 - Edwards

A module for codons • In codon_table.py: defread_codons_from_filename(codonfile): f = open(codonfile) data = {}for l in f: sl = l.split() key = sl[0] value = sl[2] data[key] = value f.close() b1 = data['Base1'] b2 = data['Base2'] b3 = data['Base3'] aa = data['AAs'] st = data['Starts'] codon_table = {} n = len(aa)for i inrange(n): codon = b1[i] + b2[i] + b3[i] isInit = (st[i] == 'M') codon_table[codon] = (aa[i],isInit)return codon_table BCHB524 - 2013 - Edwards

Lab exercises • Rework the lecture, and your solutions (or mine) from the homework exercises #1 through #3, to make a MyDNAStuff module. • Put as many useful nucleotide functions as possible into the module... • Rework the lecture, and your solutions (or mine) from homework exercises #4 and #5, to make the codon_table module with functions specified in this lecture. • Demonstrate the use of these modules to translate an amino-acid sequence in all six-frames with just a few lines of code. • The final result should look similar to Slide 10. • Your program should handle DNA sequence with N’s in it. BCHB524 - 2013 - Edwards

Class Project: Expectations • 40% of your grade! • Project Report • Long version of your homework write-up • Project Presentation • Demo your program • Describe your project solution BCHB524 - 2013 - Edwards

Class Project: Blast Database • Write a program that computes all pairwise blast alignments for two species' proteomes and stores the alignments in a relational database. • Write a program that retrieves the blast alignment for two proteins (specified by their accessions) from the relational database. • Write a program that finds pairs of orthologous proteins that are mutually best hits in the species' proteomes. BCHB524 - 2013 - Edwards

Class Project: MS/MS Viewer • Write a program to display peptide fragmentation spectra from an mzXML file. • The program will take an mzXML file, a scan number, and a peptide sequence as input. • The peptide's b-ion and y-ion m/z values should be computed, and peaks matching these m/z values annotated with appropriate labels. • The output figure/plot should aid the user in determining whether or not the peptide is a good match to the spectrum. BCHB524 - 2013 - Edwards

Class Project: Protein Digest • Write a simple web-server application using TurboGears to carry out anin silicoenzymatic digest of a user-provided protein sequence. • Users should be able to specify min and max length, min and max molecular weight, # of missed cleavages, and specific enzyme. • Output should be a table of peptides, with their length, molecular weight, # of missed cleavages, and amino-acids to left and right of each peptide in the protein sequence. BCHB524 - 2013 - Edwards

Advanced Python Concepts: Modules

Advanced Python Concepts: Modules

Presentation Transcript

Teaching Advanced Word 2007

MDROs (a.k.a. Bad Bugs)-- Current Concepts in Treatment

Python Programing: An Introduction to Computer Science

Python Programming: An Introduction To Computer Science

Python Programming: An Introduction to Computer Science

Python Programming Workshop

Python Programming: An Introduction to Computer Science

Introduction to Python, COM and PythonCOM

Introduction to Python

UNDERSTANDING ADVANCED CONCEPTS IN CORNEAL TOPOGRAPHY

Advanced Statistical Methods in NLP Ling 572 March 6, 2012

Chapter 25: Advanced Transaction Processing

Dezső Sima September 2008

Introduction to EPF Agenda

Welcome to our second (advanced) course on using AD Model Builder

An Introduction to GNSS

Center of Excellence (Electronics Sector)

Advanced TCP/IP Concepts and Practices

An Introduction to Concepts in C++0x