190 likes | 227 Views
Advanced Python Data Structures. BCHB524 Lecture 7. Outline. Review of list data-structures Advanced Data-structures Dictionaries, Sets, Files Reading, parsing files (codon tables) Exercises. Data-structures: Lists. Compound data-structure: Many objects in order numbered from 0
E N D
Advanced Python Data Structures BCHB524Lecture 7 BCHB524 - Edwards
Outline • Review of list data-structures • Advanced Data-structures • Dictionaries, Sets, Files • Reading, parsing files (codon tables) • Exercises BCHB524 - Edwards
Data-structures: Lists • Compound data-structure: • Many objects in order numbered from 0 • [] indicates list. • Item access and iteration • Same as for string, "l[i]" for item i • "for item in l" for each item of the list. • List modification • items can be changed, added, or deleted. • Range is a list • String ↔ List BCHB524 - Edwards
Python Data-structures: Dictionaries • Compound data-structure, stores any number of arbitrary key-value pairs. • Keys and/or value can be different types • Can be empty • Values can be accessed by key • Keys, values, or pairs can be accessed by iteration • Values can be changed • Key, value pairs can be added • Key, value pairs can be deleted BCHB524 - Edwards
Dictionaries: Syntax and item access # Simple dictionaryd = {'a': 1, 'b': 2, 'acdef': 3}print d# Access value using its keyprint d['a']# Change value associated with a keyd['acdef'] = 5print d# Add value by assigning to a dictionary keyd['newkey'] = 10print d BCHB524 - Edwards
Dictionaries: Iteration # Initialized = {'a': 1, 'b': 2, 'acdef': 5, 'newkey': 10}# keys from dprint d.keys()# values from dprint d.values()# key-value pairs from dprint d.items()# Iterate through the keys of dfor k in d.keys():print k,print# Iterate through the key-value pairs of dfor k,v in d.items():print k,"=",v,print BCHB524 - Edwards
Dictionaries: Different from lists? # Initialized = {}# Add some values, integer keys!d[0] = 1d[1] = 2d[10] = 1000# See how the dictionary looksprint d# Test whether a key is in the dictionaryprint"Is key 15 in d?",d.has_key(15)# Access value with key 15 with default -1print"Value for key 15, or -1:",d.get(15,-1)# Access value with key 15 - error!print"Value for key 15:",d[15] BCHB524 - Edwards
Python Data-structures: Sets • Compound data-structure, stores any number of arbitrary distinct data-items. • Data-items can be different types • Can be empty • Items can be accessed by iteration only. • Items can be tested for membership. • Items can be added • Items can be deleted BCHB524 - Edwards
Sets: Add and Test Elements # Make an empty sets = set()print s# Add an element, and then a list of elementss.add('a')s.update(['b','c','d'])print s# Test for membershipprint"e is in s",('e'in s)print"e is not in s",('e'notin s)print"c is in s",('c'in s) BCHB524 - Edwards
Python Data-structures: Files • Read strings from file, or • Write strings to file. • Get access to lines as strings by iteration. • …or get the entire contents of the file as a string • Write by printing strings to file. • MUST open and close files: • Need to indicate whether we want to read or write. BCHB524 - Edwards
Files: Reading # Open a file, store "handle" in ff = open('anthrax_sasp.nuc')# MAGIC!print''.join(f.read().split())# Close the file. f.close()# Slowly, now...f = open('anthrax_sasp.nuc')# Store the entire file's contents in s (as string)s = f.read()print s# Split s at whitespacesl = s.split()print sl# Join split s with nothing in betweenjl = ''.join(sl)print jl# Close the filef.close() BCHB524 - Edwards
Files: Reading # Open a filef = open('anthrax_sasp.nuc')# Iterate line-by-linefor line in f:print line# Close the filef.close()# Open a filef = open('anthrax_sasp.nuc')# Iterate line-by-line, and accumulate the sequenceseq = ""for line in f: seq += line.strip()print"The sequence is",seq# Close the filef.close() BCHB524 - Edwards
DNA Translation • First read a codon table from a file • Codon table from NCBI's on-line taxonomy resource • Read line by line and use initial word to store 3rd word appropriately. AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M---------------M---------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG BCHB524 - Edwards
DNA Translation f = open('standard.code')data = {}for l in f: sl = l.split() key = sl[0] value = sl[2] data[key] = value f.close()b1 = data['Base1']b2 = data['Base2']b3 = data['Base3']aa = data['AAs']st = data['Starts']codons = {}init = {}n = len(aa)for i inrange(n): codon = b1[i] + b2[i] + b3[i] codons[codon] = aa[i] init[codon] = (st[i] == 'M') BCHB524 - Edwards
DNA Translation f = open('anthrax_sasp.nuc')seq = ''.join(f.read().split()) f.close()seqlen = len(seq)aaseq = []for i inrange(0,seqlen,3): codon = seq[i:i+3] aa = codons[codon] aaseq.append(aa)print''.join(aaseq) BCHB524 - Edwards
Exercise 1 • Using just the concepts introduced so far, find as many ways as possible to code DNA reverse complement (at least 3!) • You may use any built-in function or string or list method. • You may use only basic data-types and lists and dictionaries. • Compare and critique each technique for robustness, speed, and correctness. BCHB524 - Edwards
Exercise 2 • Write a program that takes a codon table file (such as standard.code from the lecture) and a file containing nucleotide sequence (anthrax_sasp.nuc) as command-line arguments, and outputs the amino-acid sequence. • Modify your program to indicate whether or not the initial codon is consistent with the codon table's start codons. • Use NCBI's taxonomy resource to look up and download the correct codon table for the anthrax bacterium. Re-run your program using the correct codon table. Is the initial codon of the anthrax SASP gene a valid translation start site? BCHB524 - Edwards
Homework 4 • Due Monday, October 1st. • Submit using Canvas • Use only the techniques introduced so far. • Make sure you can run the programs demonstrated in lecture(s). • Exercises 1, 2 from Lecture 7 BCHB524 - Edwards