220 likes | 733 Views
Connections between Computer Science and Biology. connections. bioinformatics: computational approach to problems in molecular biology biological processes inspire algorithms and data structures in computer science biomolecules “compute” biological organisms “compute”. bioinformatics.
E N D
connections • bioinformatics: computational approach to problems in molecular biology • biological processes inspire algorithms and data structures in computer science • biomolecules “compute” • biological organisms “compute”
bioinformatics • sequencing the genome • predicting the structure of molecules • predicting genes, molecular function • constructing evolutionary trees • modeling cellular networks • ...
constructing evolutionary trees “The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth. The green and budding twigs may represent existing species; and those produced during each former year may represent the long succession of extinct species.” - Darwin, Origin of the Species
constructing evolutionary trees • traditional approach: use morphological features of organisms (number of legs, etc.) • current approach: use base sequences of universal molecules such as RNA
RNA molecules • strings of ribo-nucleic acids, of which there are four types, denoted by A, C, G, U. 5’ - ACCAUGGAC - 3’ • some “universal” RNA molecules function in life’s most basic processes, and so mutate slowly
CAGG Aardvark CAGA Bison CGCG Chimp UGCA Dog UGCG Elephant two possible evolutionary trees UGCG CACG • which is a better fit with the data? why? UGCG CACG CAGG UGCG UGCG CAGG CAGG Aardvark CAGA Bison CGCG Chimp UGCA Dog UGCG Elephant
parsimony score • to get a parsimony score for a tree, count the number of places where a nucleotide differs from a parent to a child
parsimony problem • input: RNA sequences for some taxa, or species • output: the most parsimonious tree for the input taxa the more taxa, the more possible trees that are candidates for being the output
application of parsimony(Luo et al., Nature, Jan 2001) • did mammals evolve independently on the north and south continents?
how many trees are there? • unfortunately, the number of possible trees grows exponentially with the number of taxa (organisms) • example of an exponential function: 2n (2 multiplied n times) • if there are n taxa, there are even more than 2n possible evolutionary trees
complexity of the parsimony problem • all known algorithms for exactly solving the parsimony problem require an exponential number of steps - this is a so-called NP-hard problem • in practice, heuristic algorithms are typically used, which try to search in an intelligent way for a good tree, but offer no guarantee of finding the best tree
connections: biologically inspired data structures • tree structures for organizing data are ubiquitous in computing (e.g. folders in a windows environment) • programming language environments support operations on trees (add-node, find-parent, etc.) for the programmer to use
summary • strong connections between biology and cs • many computational problems, such as constructing parsimonious evolutionary trees, are “intractable” • algorithms for intractable problems are often heuristic
vocabulary • bioinformatics • evolutionary tree construction; parsimony problem • exponential running time, intractable problem (technically sometimes called NP-hard problem) • heuristic algorithms