1 / 22

Bioinformatics Research Overview: Developing Algorithms for Biological Problem Solving

Explore the development of new algorithms and statistical learning methods in bioinformatics to tackle biological problems effectively. Discover correlations and predict genotypes to phenotypes in areas like protein function and drug interactions. Research includes genome sequencing, gene regulatory network inference, and comparative genomics.

delaneyjane
Download Presentation

Bioinformatics Research Overview: Developing Algorithms for Biological Problem Solving

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods that help solve biological problems > Capable of incorporating domain knowledge > Effective, Expressive, Interpretable Li Liao, SIG NewGrad, 09/29/2008

  2. Motivations • Understanding correlations between genotype and phenotype • Predicting genotype <=> phenotype • Some Phenotype examples: • Protein function • Drug/therapy response • Drug-drug interactions for expression • Drug mechanism • Interacting pathways of metabolism Li Liao, SIG NewGrad, 09/29/2008

  3. Bioinformatics in a … cell Li Liao, SIG NewGrad, 09/29/2008

  4. Li Liao, SIG NewGrad, 09/29/2008 Credit:Kellis & Indyk

  5. Projects • Genome sequencing and assembly (funded by NSF) • Homology detection, protein family classification (funded by a DuPont S&E award) • Support Vector Machines • Hidden Markov models • Graph theoretic methods • Probabilistic modeling for BioSequence (funded by NIH) • HMMs, and beyond • Motifs finding • Secondary structure • Systems Bioinformatics Prediction of Protein-Protein Interactions Inference of Gene Regulatory Networks Prediction of other regulatory elements Pattern analysis for RNAi (funded by UDRF) • Comparative Genomics • Identify genome features for diagnostic and therapeutic purposes (funded by an Army grant) Li Liao, SIG NewGrad, 09/29/2008

  6. People Current members: • Dr. Wen-Zhong Wang (Postdoc Fellow) • Roger Craig (PhD student) • Alvaro Gonzalez (PhD student) • Kevin McCormick (PhD student) • Colin Kern (PhD student) Past members: • Robel Kahsay (Ph.D. currently at DuPont Central Research & Development) • Kishore Narra (M.S. currently at VistaPrint, Inc.) • Arpita Gandhi (M.S. currently at Colgate-Palmolive Company) • Gaurav Jain (M.S. currently at Institute of Genomics, Univ. of Maryland) • Shivakundan Singh Tej (M.S.) • Tapan Patel (B.S. currently in MD/PhD program at U Penn) • Laura Shankman (B.S., currently in PhD program at U Virginia) Li Liao, SIG NewGrad, 09/29/2008

  7. Li Liao, SIG NewGrad, 09/29/2008

  8. Li Liao, SIG NewGrad, 09/29/2008

  9. Hybrid Hierarchical Assembly • Three types of reads: Sanger (~1000bp), 454 (~100bp), and SBS (~30bp). • Assembly of individual types using the best suited assemblers. • Phrap, TIGR, etc. for Sanger reads • Euler assembler and Newbler for 454 reads • Euler short, Shorty for SBS reads • Hybrid and hierarchical • Use longer reads as scaffolding to resolve repeat regions that are difficult for shorter reads • Use contigs from shorter reads (pyrosequencing) as pseudoreads to bridge gaps (nonclonable and hard stops) with Sanger reads. Li Liao, SIG NewGrad, 09/29/2008

  10. Major Findings • Hybrid hierarchical assembly is proved to be an effective way for assembling short reads • Incremental approach to selecting ABI reads is more effective than random approach in generating high coverage contigs • Staged assembly using Phrap is an effective alternative to the proprietary Newbler assembler. Publications: Gonzalez & Liao, BMC Bioinformatics 2008, 9:102. Li Liao, SIG NewGrad, 09/29/2008

  11. Blue lines are contigs generated from hybrid assembly Li Liao, SIG NewGrad, 09/29/2008

  12. Detect remote homologues Attributes: • Sequence similarity, Aggregate statistics (e.g., protein families), Pattern/motif, and more attributes (presence at phylogenetic tree). How to incorporate domain specific knowledge into the model so a classifier can be more accurate? Results: • Quasi-consensus based comparison of profile HMM for protein sequences (Kahsay et al, Bioinformatics 2005) • Using extended phylogenetic profiles and support vector machines for protein family classification (Narra & Liao, SNPD04, Craig & Liao, ICMLA’05, Craig & Liao SAC’06, Craig & Liao, Int’l J. Bioinfo & DM 2007) • Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships (JCB 2003) Li Liao, SIG NewGrad, 09/29/2008

  13. Non-linear mapping to a feature space Φ() xi Φ(xi) Φ(xj) xj L() =  i  ½  i jyi yj Φ (xi )·Φ (xj ) Li Liao, SIG NewGrad, 09/29/2008

  14. Data: phylogenetic profiles • - How toaccount for correlations among profile components? • profile extension (Narra & Liao, SNPD 04) • Transductive learning (Craig & Liao, ICMLA’05, SAC’06, IJBDM, 2007) Tree-based distance Hamming distance 0 1 1 1 1 x= = 3 0.1 1 1 1 1 1 y= = 3 0.5 z = 1 1 1 1 0 Li Liao, SIG NewGrad, 09/29/2008

  15. 0.55 0.34 Post-order traversal 0.75 0.67 1 0.33 0.5 1 0.33 0.67 0.34 0.5 0.75 0.55 1 1 0 1 0 0 0 1 1 Li Liao, SIG NewGrad, 09/29/2008

  16. Li Liao, SIG NewGrad, 09/29/2008

  17. Sequence Models (HMMs and beyond) Motivations: What is responsible for the function? • Patterns/motifs • Secondary structure To capture long range correlations of bio sequences • Transporter proteins • RNA secondary structure Methods: generative versus discriminative • Linear dependent processes • Stochastic grammars • Model equivalence Li Liao, SIG NewGrad, 09/29/2008

  18. TMMOD: An improved hidden Markov model for predicting transmembrane topology (Kahsay, Gao & Liao. Bioinformatics 2005) Li Liao, SIG NewGrad, 09/29/2008

  19. Li Liao, SIG NewGrad, 09/29/2008

  20. Li Liao, SIG NewGrad, 09/29/2008

  21. Li Liao, SIG NewGrad, 09/29/2008

  22. Inferring Regulatory Networks from Time Course Expression Data (Gandhi, Cogburn & Liao, 2008) Expression Profile Clustering K-mean Binary heat map Boolean network algorithm Li Liao, SIG NewGrad, 09/29/2008

More Related