1 / 29

Introduction to Python

Introduction to Python. BCHB524 2008 Lecture 1 . Outline. Why Python? Installation Basic Data Types Variables Functions Control Flow Useful References Reverse Complement. Why Python?. Free Portable Object-oriented Clean syntax Dynamic Scientific, Commercial Support libraries

ashby
Download Presentation

Introduction to Python

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Python BCHB5242008Lecture 1 BCHB524 - 2008 - Edwards

  2. Outline • Why Python? • Installation • Basic Data Types • Variables • Functions • Control Flow • Useful References • Reverse Complement BCHB524 - 2008 - Edwards

  3. Why Python? • Free • Portable • Object-oriented • Clean syntax • Dynamic • Scientific, Commercial • Support libraries • Extensible • Interactive • Modern BCHB524 - 2008 - Edwards http://xkcd.com/353/

  4. Why Python for Bioinformatics? • Good with • Strings • Files and Formats • Web and Databases • Objects and Concepts • BioPython • www.biopython.org BCHB524 - 2008 - Edwards

  5. Installation • Python Homepage • www.python.org • >> Download >> Select Operating System • We’ll install version 2.5.x on Windows • OS X & Linux versions also readily available. • Integrated development environment – IDLE BCHB524 - 2008 - Edwards

  6. Basic Data Types • String • Integer • Floats • Boolean • None • Tuples BCHB524 - 2008 - Edwards

  7. Basic Data Types: Integers >>> 3 3 >>> 3*4 12 >>> 3/4 0 >>> abs(-10) 10 >>> 3%4 3 >>> 2**32 4294967296L >>> 2**64 18446744073709551616L >>> 2**128 340282366920938463463374607431768211456L >>> print 2**128 340282366920938463463374607431768211456 BCHB524 - 2008 - Edwards

  8. Basic Data Types: Floats >>> 3.0 3.0 >>> 3.0*4.0 12.0 >>> 3.0/4.0 0.75 >>> abs(-10.0) 10.0 >>> 2.0**32 4294967296.0 >>> 2.0**64 1.8446744073709552e+019 >>> 2.0**128 3.4028236692093846e+038 >>> print 2.0**128 3.40282366921e+038 BCHB524 - 2008 - Edwards

  9. Basic Data Types: Strings >>> 'gcatgacgttattacgactctgtgtggcgtctgctggg' 'gcatgacgttattacgactctgtcacgccgcggtgcgactgaggcgtggcgtctgctggg' >>> 'gcatgacgttattacgactctgtgtggcgtctgctggg'[0] 'g' >>> 'gcatgacgttattacgactctgtgtggcgtctgctggg'[-1] 'g' >>> 'gcatgacgttattacgactctgtgtggcgtctgctggg'[0:4] 'gcat' >>> 'ATTCG'+'ATTCG' 'ATTCGATTCG' >>> 'ATTCG'*6 'ATTCGATTCGATTCGATTCGATTCGATTCG' >>> len('gcatgacgttattacgactctgtgtggcgtctgctggg') 38 >>>'gcatgacgttattacgactctgtgtggcgtctgctggg'.upper() 'GCATGACGTTATTACGACTCTGTGTGGCGTCTGCTGGG' >>>'gcatgacgttattacgactctgtgtggcgtctgctggg'.count('a') 5 BCHB524 - 2008 - Edwards

  10. Basic Data Types: The Rest • Special literal values: • True, False, and None >>> printTrue, False, None True False None • Tuples – pairs, triples, etc. >>> print ('A','T','G') ('A', 'T', 'G') >>> print(2.25,4.125,'a') (2.25, 4.125, 'a') BCHB524 - 2008 - Edwards

  11. Variables • Variables store values for later use>>> seq = 'gcatgacgttattacgactctgtgtggcgtctgctggg‘>>> len(seq) 38 >>> seq = seq * 3 >>> len(seq) 114 >>> met = ('A','T','G') >>> print met ('A', 'T', 'G') BCHB524 - 2008 - Edwards

  12. Using Functions • Execute a small (predefined) task >>> abs(-10) 10 >>> min(1,2,3,4,5,6) 1 >>> max(1,2,3,4,5,6) 6 >>> int(2.6) 2 >>> float(‘2.5’) 2.5 >>> int(float(‘2.5’)) 2 BCHB524 - 2008 - Edwards

  13. Using Methods • Execute a small task with a specific object >>> seq = 'gcatgacgttattacgactctgtgtggcgtctgctggg‘>>> seq.count(‘a’) 5 >>> seq.upper() 'GCATGACGTTATTACGACTCTGTGTGGCGTCTGCTGGG‘ >>> seq.endswith(‘tggg’) True >>> seq.find(‘tggg’) 34 >>> seq.upper().find(‘TGGG’) 34 BCHB524 - 2008 - Edwards

  14. Defining New Functions • Describe how to execute a small task >>> defbytwo(x): return x*2 >>> bytwo(2) 4 >>> bytwo(2.5) 5 >>> bytwo(2.75) 5.5 BCHB524 - 2008 - Edwards

  15. If Statements • Conditional execution if seq.startswith('atg'): initMet = True seq = seq[3:] else: initMet = False • Note use of indentation to define a block! BCHB524 - 2008 - Edwards

  16. For Statements • Sequential execution count = 0 for nuc in seq: if nuc == 'a': count = count + 1 printcount • Note use of indentation to define a block! BCHB524 - 2008 - Edwards

  17. References • Websites • www.python.org • >> Documentation >> Library Reference • “Module Docs” in Windows • >> Start Menu >> Python >> Module Docs • www.biopython.org • >> Documentation • Books • Lutz and Archer, “Learning Python” • Kinser, “Python for Bioinformatics” BCHB524 - 2008 - Edwards

  18. DNA as a string seq = ‘gcatgacgttattacgactctgtgtggcgtctgctgggg’ seqlen = len(seq) # set i to 0, 3, 6, 9, ..., 36 for i in range(0,seqlen,3): # As a tuple codon = (seq[i],seq[i+1],seq[i+2]) # As a string codon = seq[i:i+3] print codon print “Number of Met. amino-acids”, seq.count(‘atg’) BCHB524 - 2008 - Edwards

  19. DNA as a string • What about upper and lower case? • ATG vs atg? • Differences between DNA and RNA sequence? • Substitute U for each T? • How about ambiguous nucleotide symbols? • What should we do with ‘N’ and other ambiguity codes (R, Y, W, S, M, K, H, B, V, D)? • Strings don’t know any biology! BCHB524 - 2008 - Edwards

  20. DNA as a string seq = ‘gcatgacgttattacgactctgtgtggcgtctgctgggg’ def inFrameMet(seq): seqlen = len(seq) count = 0 for i in range(0,seqlen,3): codon = seq[i:i+3] if codon.upper() == ‘ATG’: count = count + 1 return count print “Number of Met. amino-acids”, inFrameMet(seq) BCHB524 - 2008 - Edwards

  21. DNA as a string seq = ‘gcatgacgttattacgactctgtgtggcgtctgctgggg’ def reverseComplement(seq): newseq = ‘’ for nuc in seq: if nuc == ‘A’: newseq = ‘T’+newseq elif nuc == ‘C’: newseq = ‘G’+newseq elif nuc == ‘G’: newseq = ‘C’+newseq elif nuc == ‘T’: newseq = ‘A’+newseq return newseq print “Reverse complement:”, reverseComplement(seq) BCHB524 - 2008 - Edwards

  22. DNA as a string seq = ‘gcatgacgttattacgactctgtgtggcgtctgctgggg’ def reverseComplement(seq): seq = seq.upper() newseq = ‘’ for nuc in seq: if nuc == ‘A’: newseq = ‘T’+newseq elif nuc == ‘C’: newseq = ‘G’+newseq elif nuc == ‘G’: newseq = ‘C’+newseq elif nuc == ‘T’: newseq = ‘A’+newseq else: newseq = nuc+newseq return newseq print “Reverse complement:”, reverseComplement(seq) BCHB524 - 2008 - Edwards

  23. Creating and Running Python Scripts • Creating new scripts: • File >> New Window • Write script as desired • Save in My Documents >> BCHB524 • In IDLE: • File >> Open (browse to script.py) • Run >> Run Module (or just hit F5) • Results are in command window • From Windows Command-Line: • Start >> Run (type “cmd”) • cd “My Documents\BCHB524” • script.py • Double-click on script.py BCHB524 - 2008 - Edwards

  24. Getting user input • Most programs operate on user input supplied at run-time. • raw_input function >>> seq = raw_input(‘Enter the DNA sequence: ‘) Enter the DNA sequence: ACTGACTGACTG >>> print seq ACTGACTGACTG • Command-line arguments import sys seq = sys.argv[1] print seq C:\BCHM524>script.py ACTGACTGACTG BCHB524 - 2008 - Edwards

  25. Lab Exercises • Install Python from www.python.org in “My Documents\Python25” • Run IDLE, check out installed and on-line help. • Try each of the examples shown in these slides BCHB524 - 2008 - Edwards

  26. Lab Exercises • Download or copy-and-paste the anthrax_sasp.nuc file from the course web-site. Write Python scripts to answer: • Does the sequence start with Met? • How many nucleotides in the SASP gene? • How many amino-acids in the SASP protein? • Use UniSTS (“google UniSTS”) to look up PCR markers for your favorite gene • For each forward and reverse primer, compute the reverse complement sequence BCHB524 - 2008 - Edwards

  27. Lab Exercises • Write a program to determine whether or not a given DNA sequence consists of a number of (perfect) tandem repeats. • Test it on sequences: • AAAAAAAAAAAAAAAA • CACACACACACACAC • ATTCGATTCGATTCG • GTAGTAGTAGTAGTA • TCAGTCACTCACTCAG BCHB524 - 2008 - Edwards

  28. Lab Exercises • Write a program to test whether a PCR primer is a reverse complement palindrome. • Such a primer might fold and self-hybridize! • Test your program on the following primers: • TTGAGTAGACGCGTCTACTCAA • TTGAGTAGACGTCGTCTACTCAA • ATATATATATATATAT • ATCTATATATATGTAT BCHB524 - 2008 - Edwards

  29. Lab Exercises • Using just the concepts introduced in Lecture1, find as many ways as possible to code DNA reverse complement. • You may use any built-in function or string method. • You may use only basic data-types. • Compare and critique each technique for robustness, speed, and correctness. BCHB524 - 2008 - Edwards

More Related