1 / 18

Advanced Python Data Structures

Advanced Python Data Structures. BCHB524 Lecture 7. Outline. Review of list data-structures Advanced Data-structures Dictionaries, Sets, Files Reading, parsing files (codon tables) Exercises. Data-structures: Lists. Compound data-structure: Many objects in order numbered from 0

edwardgray
Download Presentation

Advanced Python Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Python Data Structures BCHB524Lecture 7 BCHB524 - Edwards

  2. Outline • Review of list data-structures • Advanced Data-structures • Dictionaries, Sets, Files • Reading, parsing files (codon tables) • Exercises BCHB524 - Edwards

  3. Data-structures: Lists • Compound data-structure: • Many objects in order numbered from 0 • [] indicates list. • Item access and iteration • Same as for string, "l[i]" for item i • "for item in l" for each item of the list. • List modification • items can be changed, added, or deleted. • Range is a list • String ↔ List BCHB524 - Edwards

  4. Python Data-structures: Dictionaries • Compound data-structure, stores any number of arbitrary key-value pairs. • Keys and/or value can be different types • Can be empty • Values can be accessed by key • Keys, values, or pairs can be accessed by iteration • Values can be changed • Key, value pairs can be added • Key, value pairs can be deleted BCHB524 - Edwards

  5. Dictionaries: Syntax and item access # Simple dictionaryd = {'a': 1, 'b': 2, 'acdef': 3}print d# Access value using its keyprint d['a']# Change value associated with a keyd['acdef'] = 5print d# Add value by assigning to a dictionary keyd['newkey'] = 10print d BCHB524 - Edwards

  6. Dictionaries: Iteration # Initialized = {'a': 1, 'b': 2, 'acdef': 5, 'newkey': 10}# keys from dprint d.keys()# values from dprint d.values()# key-value pairs from dprint d.items()# Iterate through the keys of dfor k in d.keys():print k,print# Iterate through the key-value pairs of dfor k,v in d.items():print k,"=",v,print BCHB524 - Edwards

  7. Dictionaries: Different from lists? # Initialized = {}# Add some values, integer keys!d[0] = 1d[1] = 2d[10] = 1000# See how the dictionary looksprint d# Test whether a key is in the dictionaryprint"Is key 15 in d?",d.has_key(15)# Access value with key 15 with default -1print"Value for key 15, or -1:",d.get(15,-1)# Access value with key 15 - error!print"Value for key 15:",d[15] BCHB524 - Edwards

  8. Python Data-structures: Sets • Compound data-structure, stores any number of arbitrary distinct data-items. • Data-items can be different types • Can be empty • Items can be accessed by iteration only. • Items can be tested for membership. • Items can be added • Items can be deleted BCHB524 - Edwards

  9. Sets: Add and Test Elements # Make an empty sets = set()print s# Add an element, and then a list of elementss.add('a')s.update(['b','c','d'])print s# Test for membershipprint"e is in s",('e'in s)print"e is not in s",('e'notin s)print"c is in s",('c'in s) BCHB524 - Edwards

  10. Python Data-structures: Files • Read strings from file, or • Write strings to file. • Get access to lines as strings by iteration. • …or get the entire contents of the file as a string • Write by printing strings to file. • MUST open and close files: • Need to indicate whether we want to read or write. BCHB524 - Edwards

  11. Files: Reading # Open a file, store "handle" in ff = open('anthrax_sasp.nuc')# MAGIC!print''.join(f.read().split())# Close the file. f.close()# Slowly, now...f = open('anthrax_sasp.nuc')# Store the entire file's contents in s (as string)s = f.read()print s# Split s at whitespacesl = s.split()print sl# Join split s with nothing in betweenjl = ''.join(sl)print jl# Close the filef.close() BCHB524 - Edwards

  12. Files: Reading # Open a filef = open('anthrax_sasp.nuc')# Iterate line-by-linefor line in f:print line# Close the filef.close()# Open a filef = open('anthrax_sasp.nuc')# Iterate line-by-line, and accumulate the sequenceseq = ""for line in f:    seq += line.strip()print"The sequence is",seq# Close the filef.close() BCHB524 - Edwards

  13. DNA Translation • First read a codon table from a file • Codon table from NCBI's on-line taxonomy resource • Read line by line and use initial word to store 3rd word appropriately. AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG Starts = ---M---------------M---------------M---------------------------- Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG BCHB524 - Edwards

  14. DNA Translation f = open('standard.code')data = {}for l in f:    sl = l.split()    key = sl[0]    value = sl[2]    data[key] = value    f.close()b1 = data['Base1']b2 = data['Base2']b3 = data['Base3']aa = data['AAs']st = data['Starts']codons = {}init = {}n = len(aa)for i inrange(n):    codon = b1[i] + b2[i] + b3[i]    codons[codon] = aa[i]    init[codon] = (st[i] == 'M') BCHB524 - Edwards

  15. DNA Translation f = open('anthrax_sasp.nuc')seq = ''.join(f.read().split()) f.close()seqlen = len(seq)aaseq = []for i inrange(0,seqlen,3):    codon = seq[i:i+3]    aa = codons[codon]    aaseq.append(aa)print''.join(aaseq) BCHB524 - Edwards

  16. Exercise 1 • Using just the concepts introduced so far, find as many ways as possible to code DNA reverse complement (at least 3!) • You may use any built-in function or string or list method. • You may use only basic data-types and lists and dictionaries. • Compare and critique each technique for robustness, speed, and correctness. BCHB524 - Edwards

  17. Exercise 2 • Write a program that takes a codon table file (such as standard.code from the lecture) and a file containing nucleotide sequence (anthrax_sasp.nuc) as command-line arguments, and outputs the amino-acid sequence. • Modify your program to indicate whether or not the initial codon is consistent with the codon table's start codons. • Use NCBI's taxonomy resource to look up and download the correct codon table for the anthrax bacterium. Re-run your program using the correct codon table. Is the initial codon of the anthrax SASP gene a valid translation start site? BCHB524 - Edwards

  18. Homework 4 • Due Monday, October 1st. • Submit using Canvas • Use only the techniques introduced so far. • Make sure you can run the programs demonstrated in lecture(s). • Exercises 1, 2 from Lecture 7 BCHB524 - Edwards

More Related