1 / 30

Bioinformatics and Computational Questions in CS: Exploring Algorithms for Genetic Data Analysis

This educational content delves into the intersection of Computer Science and Biology, focusing on bioinformatics applications, computational problems, and algorithm analysis. It covers topics like sequence alignment, phylogeny, and dealing with experimental results, providing insights into algorithm formulation and complexity analysis. The material also explores specific problems such as traveling salesman and substring search, showcasing algorithms like linear and binary search. Additionally, it addresses the challenges of exponential algorithms, intractable problems like the Traveling Salesman dilemma, and the importance of heuristic approaches in problem-solving. The overview emphasizes the collaboration between computer scientists and biologists to tackle biological problems effectively, highlighting the significance of algorithm design and communication between the two disciplines. Overall, the content aims to enhance understanding and application of computational techniques in bioinformatics through algorithmic problem-solving strategies.

theresem
Download Presentation

Bioinformatics and Computational Questions in CS: Exploring Algorithms for Genetic Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Questions Bioinformatics

  2. Where CS and Biology Meet • Bioinformatics: Applications of CS to the life sciences • What are the computational issues? • Storage and retrieval of genetic data, data mining, tools • Analysis of genetic data: similarities, differences, structure • Processing experimental data

  3. Problem Solving inComputer Science • Program: Sequence of instructions that perform a particular task • Task (problem) expressed as: Given data (input), produce results (output) • From problems to programs • Formulate the problem • Develop and verify an algorithm • Write and test the program

  4. Algorithm Analysis • Algorithm: Conceptual/theoretical form of a program • What is analyzed? • Correctness: does it solve the problem? • Complexity: how much resources (time and memory) does it consume? • Tradeoffs: sometimes, we need to sacrifice correctness for efficiency

  5. Example 1: Searching for an Element in a List • Problem formulation: • Input: sorted list L of n elements (e.g., names) and a target element x • Output: the position of the target element if it exists in the list • Possible algorithms • Linear search • Binary search

  6. Linear Search • Algorithm: • For each element in L (from the first to the last element), compare it with x and return the position if equal • Time complexity: • Up to n comparisons performed • On the average, n/2 comparisons • Runs in linear time (proportional to the list size n)

  7. Binary Search • Algorithm: • Compare middle element of the list with x, return the position if equal; if not, reduce the list to either the lower half or the upper half of original list; repeat the process • Time complexity • Up to log2n comparisons performed • Runs in logarithmic time

  8. Linear vs Logarithmic Time

  9. Comparing Running Times • Exercise: tabulate values of the following run-time functions for different values of n • Functions: • log n (logarithmic) • n (linear) • n2 (quadratic) • n3 (cubic) • 2n (exponential) • n!

  10. Example 2: Substring Search • Problem formulation: • Input: Strings s and t of characters • Output: If s is a substring of t, its position in t • Example: • Input: s = “ctct”, t = “agtctcttctaac”, • Output: 4 • Algorithm? Time Complexity?

  11. Example 3: Traveling Salesman • Problem Formulation: • Input: n cities, distances between cities • Output: shortest tour of all cities • Algorithm: • Consider all permutations of the cities, compute total distances for each permutation, select the minimum among all total distances

  12. Exponential Algorithms and Intractable Problems • The Traveling Salesman problem is an example of an intractable (“NP-complete”) problem • Characterized by: • The existence of a correct exponential algorithm • No known polynomial algorithm • Exponential algorithm is impractical. Now what?

  13. Heuristics • There are polynomial algorithms for intractable problems that do not always yield the correct answer • Example: Start with any city, go to the nearest unvisited city, repeat process • Not always correct. Counterexample? • Selection of nearest city is called a heuristic • Compromise: Can prove some statements on the (incorrect) algorithm and that may be enough in practice

  14. Back to Bioinformatics: Some Objectives • Formulate problems relevant to biology • Devise/understand algorithms for these problems • Computer scientists and biologists need to talk more • Computer scientists have a tendency to make (often unreasonable) assumptions • Biologists may place too much faith on results returned by automated systems

  15. Overview: Selected Problems in Bioinformatics • Sequence alignment • Phylogeny • Dealing with experimental results

  16. BLAST Search

  17. Blast Results

  18. DNA Sequence Databases • Data representation, integrity, accuracy • Search and scoring methods • Meaning and reliability of results • e.g., how does BLAST (Basic Local Alignment Search Tool) respond to random data?

  19. Sequence Alignment Problem • Given two nucleotide sequence, obtain an optimal alignment between the sequences • Example: AT-C-TGAT-TGCAT-A-

  20. Dynamic Programming

  21. Phylogeny • Construction of phylogenetic trees based on genomic distance • Problems to be solved: • Determining genomic distance • Tree construction from the distances

  22. Determining Genomic Distance • Given two genomes, determine the number of mutations necessary to obtain one from the other • Common distance model (least number of mutations) • Mutation on the genome level: rearrangement (sorting!) operations on permutations

  23. Sorting Permutations and a Graph Theoretic Model 0 3 5 6 7 2 1 4 8 9 0 1 2 7 6 5 3 4 8 9

  24. Phylogenetic Tree Reconstruction • Given a set of species and genomic distances between the species, construct a phylogenetic tree that is (most) consistent with the distances • Problem shown to be NP-complete • This means we should try some heuristics

  25. Phylogenetic Tree Mouse Monkey Human

  26. Experimental Results • Image or data directly drawn from a device • e.g., microarray, scanner • Need to make objective, discrete conclusions • e.g., pixel intensity vs. gene expression • Need to handle errors and imperfections

  27. Microarray Image

  28. Image Analysis to Aid Microarray Experiments • Automatically locating the grid of spots • Use Fourier transforms to compute periods and offsets • Extracting intensity • Refine spot sample to collect significant, normalized data • Make conclusions on genetic function

  29. Summary • Bioinformatics: a perfect opportunity for interdisciplinary research within the sciences • Academics from the different backgrounds need to study, discuss, debate with each other

More Related