200 likes | 363 Views
Introduction to Python. BCHB524 2013 Lecture 5. Outline. Review DNA as a string Extracting codons in DNA Counting in-frame codons in DNA Reverse Complement Program Input/Output raw_input, command-line arguments standard-input, standard-output, redirection. Review.
E N D
Introduction to Python BCHB5242013Lecture 5 BCHB524 - 2013 - Edwards
Outline • Review • DNA as a string • Extracting codons in DNA • Counting in-frame codons in DNA • Reverse Complement • Program Input/Output • raw_input, command-line arguments • standard-input, standard-output, redirection BCHB524 - 2013 - Edwards
Review • Printing and execution • Variables and basic data-types: • integers, floats, strings • Arithmetic with, conversion between • String characters and chunks, string methods • Functions, using/calling and defining: • Use in any expression • Parameters as input, return for output • Control Flow: • if statements – conditional execution • for statements – iterative execution BCHB524 - 2013 - Edwards
DNA as a string seq = "gcatgacgttattacgactctgtgtggcgtctgctgggg"seqlen = len(seq)# set i to 0, 3, 6, 9, ..., 36for i inrange(0,seqlen,3):# extract the codon as a string codon = seq[i:i+3]print codonprint"Number of Met. amino-acids", seq.count("atg") BCHB524 - 2013 - Edwards
DNA as a string • What about upper and lower case? • ATG vs atg? • Differences between DNA and RNA sequence? • Substitute U for each T? • How about ambiguous nucleotide symbols? • What should we do with ‘N’ and other ambiguity codes (R, Y, W, S, M, K, H, B, V, D)? • Strings don’t know any biology! BCHB524 - 2013 - Edwards
DNA as a string seq = "gcatgacgttattacgactctgtgtggcgtctgctgggg"definFrameMet(seq): seqlen = len(seq) count = 0for i inrange(0,seqlen,3): codon = seq[i:i+3]if codon.upper() == "ATG": count = count + 1return countprint"Number of Met. amino-acids", inFrameMet(seq) BCHB524 - 2013 - Edwards
DNA as a string input_seq = "catgacgttattacgactctgtgtggcgtctgctgggg"defcomplement(nuc): nucleotides = 'ACGT' complements = 'TGCA' i = nucleotides.find(nuc) comp = complements[i]return compdefreverseComplement(seq): newseq = ""for nuc in seq: newseq = complement(nuc) + newseqreturn newseqprint"Reverse complement:", reverseComplement(input_seq) BCHB524 - 2013 - Edwards
DNA as a string input_seq = "catgacgttattacgactctgtgtggcgtctgctgggg"defcomplement(nuc): nucleotides = 'ACGT' complements = 'TGCA' i = nucleotides.find(nuc)if i >= 0: comp = complements[i]else: comp = nucreturn compdefreverseComplement(seq): seq = seq.upper() newseq = ""for nuc in seq: newseq = complement(nuc) + newseqreturn newseqprint"Reverse complement:", reverseComplement(input_seq) BCHB524 - 2013 - Edwards
Creating reusable programs • Need to get input data and options from the user • …often us, but sometimes others, or us later. • Sometimes, want completely new inputs • …but often, want the same or similar input. • Sometimes, typing the input is OK • …but often, want to use data in a file. • Sometimes, output to the screen is OK • …but often, want the result to go into a file. BCHB524 - 2013 - Edwards
Interactive input input_seq = raw_input("Type your codon: ")defcomplement(nuc): nucleotides = 'ACGT' complements = 'TGCA' i = nucleotides.find(nuc)if i >= 0: comp = complements[i]else: comp = nucreturn compdefreverseComplement(seq): seq = seq.upper() newseq = ""for nuc in seq: newseq = complement(nuc) + newseqreturn newseqprint"Reverse complement:", reverseComplement(input_seq) BCHB524 - 2013 - Edwards
Command-line input import sysinput_seq = sys.argv[1]defcomplement(nuc): nucleotides = 'ACGT' complements = 'TGCA' i = nucleotides.find(nuc)if i >= 0: comp = complements[i]else: comp = nucreturn compdefreverseComplement(seq): seq = seq.upper() newseq = ""for nuc in seq: newseq = complement(nuc) + newseqreturn newseqprint"Reverse complement:", reverseComplement(input_seq) BCHB524 - 2013 - Edwards
Interactive and file input import sysinput_seq = sys.stdin.read() defcomplement(nuc): nucleotides = 'ACGT' complements = 'TGCA' i = nucleotides.find(nuc)if i >= 0: comp = complements[i]else: comp = nucreturn compdefreverseComplement(seq): seq = seq.upper() newseq = ""for nuc in seq: newseq = complement(nuc) + newseqreturn newseqprint"Reverse complement:", reverseComplement(input_seq) BCHB524 - 2013 - Edwards
File input only import sysseq_file = sys.argv[1]# MAGIC: open file, read contents, and remove whitespaceinput_seq = ''.join(open(seq_file).read().split())defcomplement(nuc): nucleotides = 'ACGT' complements = 'TGCA' i = nucleotides.find(nuc)if i >= 0: comp = complements[i]else: comp = nucreturn compdefreverseComplement(seq): seq = seq.upper() newseq = ""for nuc in seq: newseq = complement(nuc) + newseqreturn newseqprint"Reverse complement:", reverseComplement(input_seq) BCHB524 - 2013 - Edwards
Input Summary • raw_input provides interactive values from the user (also copy-and-paste) • sys.stdin.read() provides interactive or file-based values from the user (also copy-and-paste) • sys.argv[1] provides command-line values from the user (also copy-and-paste) • value can be a filename that provides user-input • Terminal standard-input redirection "<" can be used to send a file's contents to raw_input or sys.stdin.read() BCHB524 - 2013 - Edwards
Output is easy… • Just use print, right? • Print statements go to the terminal's standard-output. • We can redirect to a file using ">" • Errors still get printed to the terminal. • We can also link programs together – standard-output to standard-input using "|" • Also, cat just writes its file to standard out BCHB524 - 2013 - Edwards
Connect reverse complement w/ codon counting… • Create and test rc.py from earlier slides: • Sequence from standard-input • Reverse complement sequence to standard-output • Create and test codons.py from earlier slides: • Sequence from standard-input • Count to standard-output • Place example sequence in file: test.seq • Execute: cat test.seq | python rc.py | python codons.py BCHB524 - 2013 - Edwards
In general • Windows and OS X have similar facilities • cmd in windows, terminal in OS X • Powerful mechanism for making reusable programs • No knowledge of python required for use! • Most bioinformatics software is used from the command-line w/ command-line arguments: • Files provide sequence data, etc. • I'll promote this style of program I/O. BCHB524 - 2013 - Edwards
Exercise 1 • Use UniSTS (“google UniSTS”) to look up PCR markers for your favorite gene • Write a command-line program to compute the reverse complement sequence for the forward and reverse primer. BCHB524 - 2013 - Edwards
Exercise 2 • Write a command-line program to test whether a PCR primer is a reverse complement palindrome. • Such a primer might fold and self-hybridize! • Test your program on the following primers: • TTGAGTAGACGCGTCTACTCAA • TTGAGTAGACGTCGTCTACTCAA • ATATATATATATATAT • ATCTATATATATGTAT BCHB524 - 2013 - Edwards
Homework 3 • Due Monday, September 16th. • Submit using Blackboard • Use only the techniques introduced so far. • Make sure you can run the programs demonstrated in lecture(s). • Exercises 1, 2 from Lecture 4 • Exercises 1, 2 from Lecture 5 • Rosalind exercises 6,7 BCHB524 - 2013 - Edwards