70 likes | 220 Views
Biological Information Integration Toolkit. Jeremy Praissman, Dawei Lin, John Rose, Bi-Cheng Wang. Motivation. Calculate simple properties of genomes and group genes according to these properties Tie together and integrate data and analyses (Blast results, annotation etc.).
E N D
Biological Information Integration Toolkit Jeremy Praissman, Dawei Lin, John Rose, Bi-Cheng Wang
Motivation • Calculate simple properties of genomes and group genes according to these properties • Tie together and integrate data and analyses (Blast results, annotation etc.) Accomplish the above in a way that provides additional support for implementing new bioinformatics algorithms
Genomes & Genes Genome Data Structure • Loads data and initializes other data structures • Exports most of the functionality of the Strand data structure (described below) • Contains “Gene” objects which are wrappers for BioPython SeqFeatures Gene Data Structure • Calculate upstream/downstream intergenic distance (IGD) • Easily generate subsequences relative to gene location
Strands • A list of Genes - represents a strand of biological information • Implicit intergenic regions Functionality • Base object for computing statistics • Number of sequence features (total, overlapping etc..) • Number of bases (in overlapping areas of features, in features etc..) • Generate new Strands based on IGD • Filter and map operations for obtaining “Sets”
Sets • Unordered collection of objects • Supports: • Membership testing • Union • Intersection • Other set operations Example: set_1 = all Genes in a Genome with start codon GTG set_2 = all Genes in a Genome with upstream igd < 20 set_3 = set_1.intersect(set_2) is the set containing all Genes in the Genome with both properties
Graphs • Combinatorial graph data structures and algorithms • Vertices tied to Genes, edges represent relationships between Genes Functionality • Support for linking vertices based on gene properties • Additional support class (Similarity) for building graphs using blast data and user supplied parameters • Tarjan’s fast (O(v + e)) algorithm for finding strongly connected components • Johnson’s fast (O(v + e) * c) algorithm for finding elementary circuits
Acknowledgements • Dr. Dawei Lin • Dr. John Rose • Dr. B.C. Wang • The BioPython people • The BioPerl people http://www.secsg.org