1 / 22

Interactive tools and programming environments for sequence analysis

Explore Matlab and Darwin bioinformatics tools for sequence analysis, alignment, and evolutionary distances. Utilize dynamic programming algorithms, scoring matrices, and statistical significance calculations. Benefit from a unified approach for alignment and structure prediction.

luisag
Download Presentation

Interactive tools and programming environments for sequence analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interactive tools and programming environments for sequence analysis TATACATAAAGACCCAAATGGAACTGTTCTAGATGATACACTAGCATTAAGAGAAAAATTCGAAGAATCAGTCGATAAATACAAACTTCATTTTACTGGATTAATCGCTGACAAAATTGCAAAAGAAAAACTGAATACTTACGTCCTCACTTATAAAAAAGCAGACGAAGCTATGCCTGCAGACGAAGCTATGCCAACTGATGTACCTAGTACTTCTGTTACTGGATCAACAATGGCAAAC…………………. Bernardo Barbiellini Northeastern University

  2. Overview • Matlab and Darwin – bioinformatics tools • Dotplot and Statistical signifance of alignments • Scoring Matrices from Evolution Model • Evolutionary Distances and Phylogenetic Trees. • Unified approach for the sequence alignment and structure prediction

  3. Matlab toolbox and Darwin • Computer language appropriate for bioinformatics • A workbench to automate repetitive tasks • Based on Linear Algebra & Statistics • Matlab toolbox developed by Mathworks • Darwin developed by Gaston Gonnet (ETHZ)

  4. Extra features • Loading of and retrieval in sequence databases • Fast searching for sequence fragments • Sequence alignment • Generation of random sequences, distributions and mutations • Creation of Phylogenetic trees • Plotting functions - matrix and vector arithmetic • I/O comunicate with other programs

  5. Calling Bioperl functions in MATLAB Documentation by Brian Madsen (NU and coop at the Mathworks) >> help perl PERL calls perl script using appropriate operating system PERL(PERLFILE) calls perl script specified by the file PERLFILE using appropriate perl executable. PERL(PERLFILE,ARG1,ARG2,...) passes the arguments ARG1,ARG2,... to the perl script file PERLFILE, and calls it by using appropriate perl executable. RESULT=PERL(...) outputs the result of attempted perl call.

  6. Visual Tool: Dotplot (1) Pairwise sequence comparison

  7. Visual Tool: Dotplot (2) Filtered Image The best alignment is achieved with dynamic programming . A score is obtained

  8. Quantitative Tools To CheckStatistical Significance extreme value distribution. Score in bits Simulation with random sequences

  9. PAM Evolution Model The score of a paiwise alignment is obtained by using a scoring matrix. We need a model to build scoring matrices. This model is based on evolution in order to calculate evolution distances between species. PAM means Accepted Point Mutation

  10. Step1: Order of the Amino-Acids

  11. Step 2: Mutation Matrices Markov Model pamX=(pam1)^X Stochastic matrices

  12. Step 3: Distribution of Amino Acids Eigenvector of the mutation matrix (eigenvalue 1)

  13. Step 4: Evolutionary time vs. sequences differences

  14. Step 5: Scoring Matrix The Dayhoff scoring matrix is symmetric

  15. Tree Construction 1:Evolutionary distance calculations Maximum Likelihood

  16. PAM Spinach Rice Mosquito Monkey Human Spinach 0.0 84.9 105.6 90.8 86.3 Rice 84.9 0.0 117.8 122.4 122.6 Mosquito 105.6 117.8 0.0 84.7 80.8 Monkey 90.8 122.4 84.7 0.0 3.3 Human 86.3 122.6 80.8 3.3 0.0 Tree Construction 2:Table of distances

  17. Tree Construction 3:Neighbor joining algorithm

  18. Protein Protein Protein Structure Optimization with Dynamic Programming approach Needleman-Wunsch Algorithm or Smith-Waterman Algorithm Viterbi Algorithm HMM Query Protein Protein Subject Protein (letter of amino acids) Structure (, , coil) Scoring Matrix Log (Aij/pi) Log (P(im)/pi) Penalties Gaps Transition from structure to another Unified approach for the sequence alignment and structure prediction

  19. Conclusions • The highly efficient dynamic programming algorithms, used in this integrated environment, are particularly suitable for the high performance computers. • Trees constructed using optimal PAM distances are better than the routinesingle distance scores obtained using a single scoring matrix. • The unified approach for the sequence alignment and structure prediction provides a powerful formalism for biologists.

  20. ASCC Northeastern University

  21. Northeastern University (NU)/Hewlett-Packard (HP) Company Collaborative Research Program on Bioinformatics Bernardo Barbiellini, Assoc. Director, ASCC Arun Bansil, Professor of Physics & Director ASCC. Bill Detrich, Prof. Biochem. & Marine Biology, Director Bioinformatics M. S. Kostia Bergman, Prof. Biology Mike Malioutov, Stone Professor of Applied Statistics Mary Jo Ondrechen, Professor of Chemistry Nagarajan Sankrithi, graduate student NU Imtiaz Khan, graduate student NU Alper Uzun, graduate student NU Larry Weissman, staff HP/Compaq Barry Latham, staff HP/Compaq Bob Morgan, staff HP/Compaq

  22. Other Bioinformatics activities at ASCC • BIO3580: DNA and Protein Sequence Analysis (2001, 2002) • MATLAB BIOINFORMATICS TOOL presentation (Robert Henson) • Summer Institute of Mathematical Studies on Bioinformatics (2002) (Professor Mike Malioutov) • Student projects proposed by Dr. Matteo Pellegrini, (Proteinpathways/UCLA).

More Related