130 likes | 566 Views
BioPython Tutorial. Joe Steele Ishwor Thapa. BioPython home page. http:// biopython.org/wiki/Main_Page http:// biopython.org/DIST/docs/tutorial/Tutorial.html. Content. Automatically parses files into python data structures, with support for : BLAST output Clustalw FASTA GenBank
E N D
BioPython Tutorial Joe Steele IshworThapa
BioPython home page • http://biopython.org/wiki/Main_Page • http://biopython.org/DIST/docs/tutorial/Tutorial.html
Content Automatically parses files into python data structures, with support for: BLAST output Clustalw FASTA GenBank PubMed and Medline SwissProt UniGene Interfaces to: Standalone Blast Clustalw EMBOSS command line tools BioSQL many others….
Where? Requires python Installed on biobase.ist.unomaha.edu bio-linux.ist.unomaha.edu >python my_biopython_routine.py
Handling Sequences from Bio.Seq import Seq #from Bio import * #from Bio import Entrez my_seq = Seq("AGTACACTGGTT") print my_seq print my_seq.alphabet print "my_seq complement" print my_seq.complement() print "my_seqreverse_complement" print my_seq.reverse_complement() print "Change the case." print my_seq.lower() print my_seq.upper()
Sequences from Bio import SeqIO from Bio.Seq import Seq my_seq = Seq("AGTACACTGGTT") print "How many G's are in my_seq?" print my_seq.count("G") print "my_seq is an array. Print elements 2 to 7." print my_seq[2:8] print "Print every other element." print my_seq[0::2] print "Reverse it." print my_seq[::-1] print "I just want a regular string." print str(my_seq) print "Make the sequence longer." more_seq = Seq("GGGGGGGGG") print my_seq + more_seq
Translate print "my_seq is a CDS. What protein does it make?" print my_seq.translate()
Read a FASTA file print "Run over a fasta file:" for seq_record in SeqIO.parse("af193789.fasta","fasta"): print seq_record.id print repr(seq_record.seq) print len(seq_record)
Read a GenBank file print "Run over a genbank file:" for seq_record in SeqIO.parse("ls_orchid.gbk","genbank"): print seq_record.id print repr(seq_record.seq) print len(seq_record) count = SeqIO.write(seq_record, "ls_orchid.fasta", "fasta") print "Converted %i records" % count
Convert files count = SeqIO.convert("ls_orchid.gbk", "genbank", "ls_orchid.fasta", "fasta") print "Converted %i records" % count ##help(SeqIO.convert) Write out the reverse complement: records = (make_rc_record(rec) for rec in SeqIO.parse("ls_orchid.fasta", "fasta") if len(rec)<700) SeqIO.write(records, "rev_comp.fasta", "fasta")