300 likes | 493 Views
Computational Questions. Bioinformatics. Where CS and Biology Meet. Bioinformatics: Applications of CS to the life sciences What are the computational issues? Storage and retrieval of genetic data, data mining, tools Analysis of genetic data: similarities, differences, structure
E N D
Computational Questions Bioinformatics
Where CS and Biology Meet • Bioinformatics: Applications of CS to the life sciences • What are the computational issues? • Storage and retrieval of genetic data, data mining, tools • Analysis of genetic data: similarities, differences, structure • Processing experimental data
Problem Solving inComputer Science • Program: Sequence of instructions that perform a particular task • Task (problem) expressed as: Given data (input), produce results (output) • From problems to programs • Formulate the problem • Develop and verify an algorithm • Write and test the program
Algorithm Analysis • Algorithm: Conceptual/theoretical form of a program • What is analyzed? • Correctness: does it solve the problem? • Complexity: how much resources (time and memory) does it consume? • Tradeoffs: sometimes, we need to sacrifice correctness for efficiency
Example 1: Searching for an Element in a List • Problem formulation: • Input: sorted list L of n elements (e.g., names) and a target element x • Output: the position of the target element if it exists in the list • Possible algorithms • Linear search • Binary search
Linear Search • Algorithm: • For each element in L (from the first to the last element), compare it with x and return the position if equal • Time complexity: • Up to n comparisons performed • On the average, n/2 comparisons • Runs in linear time (proportional to the list size n)
Binary Search • Algorithm: • Compare middle element of the list with x, return the position if equal; if not, reduce the list to either the lower half or the upper half of original list; repeat the process • Time complexity • Up to log2n comparisons performed • Runs in logarithmic time
Comparing Running Times • Exercise: tabulate values of the following run-time functions for different values of n • Functions: • log n (logarithmic) • n (linear) • n2 (quadratic) • n3 (cubic) • 2n (exponential) • n!
Example 2: Substring Search • Problem formulation: • Input: Strings s and t of characters • Output: If s is a substring of t, its position in t • Example: • Input: s = “ctct”, t = “agtctcttctaac”, • Output: 4 • Algorithm? Time Complexity?
Example 3: Traveling Salesman • Problem Formulation: • Input: n cities, distances between cities • Output: shortest tour of all cities • Algorithm: • Consider all permutations of the cities, compute total distances for each permutation, select the minimum among all total distances
Exponential Algorithms and Intractable Problems • The Traveling Salesman problem is an example of an intractable (“NP-complete”) problem • Characterized by: • The existence of a correct exponential algorithm • No known polynomial algorithm • Exponential algorithm is impractical. Now what?
Heuristics • There are polynomial algorithms for intractable problems that do not always yield the correct answer • Example: Start with any city, go to the nearest unvisited city, repeat process • Not always correct. Counterexample? • Selection of nearest city is called a heuristic • Compromise: Can prove some statements on the (incorrect) algorithm and that may be enough in practice
Back to Bioinformatics: Some Objectives • Formulate problems relevant to biology • Devise/understand algorithms for these problems • Computer scientists and biologists need to talk more • Computer scientists have a tendency to make (often unreasonable) assumptions • Biologists may place too much faith on results returned by automated systems
Overview: Selected Problems in Bioinformatics • Sequence alignment • Phylogeny • Dealing with experimental results
DNA Sequence Databases • Data representation, integrity, accuracy • Search and scoring methods • Meaning and reliability of results • e.g., how does BLAST (Basic Local Alignment Search Tool) respond to random data?
Sequence Alignment Problem • Given two nucleotide sequence, obtain an optimal alignment between the sequences • Example: AT-C-TGAT-TGCAT-A-
Phylogeny • Construction of phylogenetic trees based on genomic distance • Problems to be solved: • Determining genomic distance • Tree construction from the distances
Determining Genomic Distance • Given two genomes, determine the number of mutations necessary to obtain one from the other • Common distance model (least number of mutations) • Mutation on the genome level: rearrangement (sorting!) operations on permutations
Sorting Permutations and a Graph Theoretic Model 0 3 5 6 7 2 1 4 8 9 0 1 2 7 6 5 3 4 8 9
Phylogenetic Tree Reconstruction • Given a set of species and genomic distances between the species, construct a phylogenetic tree that is (most) consistent with the distances • Problem shown to be NP-complete • This means we should try some heuristics
Phylogenetic Tree Mouse Monkey Human
Experimental Results • Image or data directly drawn from a device • e.g., microarray, scanner • Need to make objective, discrete conclusions • e.g., pixel intensity vs. gene expression • Need to handle errors and imperfections
Image Analysis to Aid Microarray Experiments • Automatically locating the grid of spots • Use Fourier transforms to compute periods and offsets • Extracting intensity • Refine spot sample to collect significant, normalized data • Make conclusions on genetic function
Summary • Bioinformatics: a perfect opportunity for interdisciplinary research within the sciences • Academics from the different backgrounds need to study, discuss, debate with each other