300 likes | 313 Views
This educational content delves into the intersection of Computer Science and Biology, focusing on bioinformatics applications, computational problems, and algorithm analysis. It covers topics like sequence alignment, phylogeny, and dealing with experimental results, providing insights into algorithm formulation and complexity analysis. The material also explores specific problems such as traveling salesman and substring search, showcasing algorithms like linear and binary search. Additionally, it addresses the challenges of exponential algorithms, intractable problems like the Traveling Salesman dilemma, and the importance of heuristic approaches in problem-solving. The overview emphasizes the collaboration between computer scientists and biologists to tackle biological problems effectively, highlighting the significance of algorithm design and communication between the two disciplines. Overall, the content aims to enhance understanding and application of computational techniques in bioinformatics through algorithmic problem-solving strategies.
E N D
Computational Questions Bioinformatics
Where CS and Biology Meet • Bioinformatics: Applications of CS to the life sciences • What are the computational issues? • Storage and retrieval of genetic data, data mining, tools • Analysis of genetic data: similarities, differences, structure • Processing experimental data
Problem Solving inComputer Science • Program: Sequence of instructions that perform a particular task • Task (problem) expressed as: Given data (input), produce results (output) • From problems to programs • Formulate the problem • Develop and verify an algorithm • Write and test the program
Algorithm Analysis • Algorithm: Conceptual/theoretical form of a program • What is analyzed? • Correctness: does it solve the problem? • Complexity: how much resources (time and memory) does it consume? • Tradeoffs: sometimes, we need to sacrifice correctness for efficiency
Example 1: Searching for an Element in a List • Problem formulation: • Input: sorted list L of n elements (e.g., names) and a target element x • Output: the position of the target element if it exists in the list • Possible algorithms • Linear search • Binary search
Linear Search • Algorithm: • For each element in L (from the first to the last element), compare it with x and return the position if equal • Time complexity: • Up to n comparisons performed • On the average, n/2 comparisons • Runs in linear time (proportional to the list size n)
Binary Search • Algorithm: • Compare middle element of the list with x, return the position if equal; if not, reduce the list to either the lower half or the upper half of original list; repeat the process • Time complexity • Up to log2n comparisons performed • Runs in logarithmic time
Comparing Running Times • Exercise: tabulate values of the following run-time functions for different values of n • Functions: • log n (logarithmic) • n (linear) • n2 (quadratic) • n3 (cubic) • 2n (exponential) • n!
Example 2: Substring Search • Problem formulation: • Input: Strings s and t of characters • Output: If s is a substring of t, its position in t • Example: • Input: s = “ctct”, t = “agtctcttctaac”, • Output: 4 • Algorithm? Time Complexity?
Example 3: Traveling Salesman • Problem Formulation: • Input: n cities, distances between cities • Output: shortest tour of all cities • Algorithm: • Consider all permutations of the cities, compute total distances for each permutation, select the minimum among all total distances
Exponential Algorithms and Intractable Problems • The Traveling Salesman problem is an example of an intractable (“NP-complete”) problem • Characterized by: • The existence of a correct exponential algorithm • No known polynomial algorithm • Exponential algorithm is impractical. Now what?
Heuristics • There are polynomial algorithms for intractable problems that do not always yield the correct answer • Example: Start with any city, go to the nearest unvisited city, repeat process • Not always correct. Counterexample? • Selection of nearest city is called a heuristic • Compromise: Can prove some statements on the (incorrect) algorithm and that may be enough in practice
Back to Bioinformatics: Some Objectives • Formulate problems relevant to biology • Devise/understand algorithms for these problems • Computer scientists and biologists need to talk more • Computer scientists have a tendency to make (often unreasonable) assumptions • Biologists may place too much faith on results returned by automated systems
Overview: Selected Problems in Bioinformatics • Sequence alignment • Phylogeny • Dealing with experimental results
DNA Sequence Databases • Data representation, integrity, accuracy • Search and scoring methods • Meaning and reliability of results • e.g., how does BLAST (Basic Local Alignment Search Tool) respond to random data?
Sequence Alignment Problem • Given two nucleotide sequence, obtain an optimal alignment between the sequences • Example: AT-C-TGAT-TGCAT-A-
Phylogeny • Construction of phylogenetic trees based on genomic distance • Problems to be solved: • Determining genomic distance • Tree construction from the distances
Determining Genomic Distance • Given two genomes, determine the number of mutations necessary to obtain one from the other • Common distance model (least number of mutations) • Mutation on the genome level: rearrangement (sorting!) operations on permutations
Sorting Permutations and a Graph Theoretic Model 0 3 5 6 7 2 1 4 8 9 0 1 2 7 6 5 3 4 8 9
Phylogenetic Tree Reconstruction • Given a set of species and genomic distances between the species, construct a phylogenetic tree that is (most) consistent with the distances • Problem shown to be NP-complete • This means we should try some heuristics
Phylogenetic Tree Mouse Monkey Human
Experimental Results • Image or data directly drawn from a device • e.g., microarray, scanner • Need to make objective, discrete conclusions • e.g., pixel intensity vs. gene expression • Need to handle errors and imperfections
Image Analysis to Aid Microarray Experiments • Automatically locating the grid of spots • Use Fourier transforms to compute periods and offsets • Extracting intensity • Refine spot sample to collect significant, normalized data • Make conclusions on genetic function
Summary • Bioinformatics: a perfect opportunity for interdisciplinary research within the sciences • Academics from the different backgrounds need to study, discuss, debate with each other