1 / 47

VISTA family of computational tools for comparative genomics

VISTA family of computational tools for comparative genomics. How can we leverage genome sequences from many species to learn about genome function? Microbial applications Inna Dubchak, Genomics Division LBNL, JGI ildubchak@lbl.gov vista@lbl.gov. Human Genome Annotation. Gene A.

osric
Download Presentation

VISTA family of computational tools for comparative genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VISTA family of computational tools for comparative genomics • How can we leverage genome sequences from many species to learn about genome function? • Microbial applications Inna Dubchak, Genomics Division LBNL, JGI ildubchak@lbl.gov vista@lbl.gov

  2. Human Genome Annotation Gene A • only 1–2% coding • efficient identification of regulatory sequences?

  3. functional region = conservation 80 million years Last Common Ancestor conservation sequence Sequence conservation implies function divergence = non functional AGTTGAAAC GGAGCTGATGGAGC GGTGGGC T CTATAAATGC A C CTATAAATGC A C TACATTTCG ACTGTATCGCCTCG CAACCCT A potentialfunctional region

  4. Human Chimp Urchin Mouse Drosophila Sequence Alignment Similar Genes Synteny Comparative Genomics Introduction

  5. VISTAis an integrated system forglobal sequence alignment and visualization for comparative genomic analysis http://genome.lbl.gov/vista

  6. How does VISTA Work: Global Genomic Aligments 1- anchoring: identify regions of strong similarity 2- chaining: join regions of weak or no similarity sequence 1 sequence 2 AlgorithmFeature AVID* can handle draft sequence LAGAN** produces true multiple alignments Shuffle-LAGAN**handles rearrangements (inversions, translocations) * Lior Pachter, UC Berkeley ** Michael Brudno, U. Toronto

  7. Global Genomic Aligner Output 104670599 TCCCCAACTATAAATGGATGAAATTGCAGGAAATGACAGGTA-----TGACCCCTTCTCT 104670653 >>>>>>>>> ||| ||| | |||||| | || || | | | ||||||| || <<<<<<<<< 052328645 TCCTCAATTCAGAATGGAGGGAAGCACACAGGACACAGAGATCCCTTTACCCCCTTCGCT 052328704 104670654 ACCAGAGGCTTGGATTTTTTTTCTTCTTCTCCTCCCTTAGCCCGTGTTGAGCTATTTCGG 104670713 >>>>>>>>> | | | || | | | <<<<<<<<< 052328705 ATGT----------------------------------------TATCAGGCCACTCAAG 052328724 104670714 AGTTTCCTGGCAGGGAAGAGCGAGTGAGGCTGCCTTACCTTCAGGATGACCACTAGCAGG 104670773 >>>>>>>>> |||| | || || | ||||| ||||||| | ||| ||||||| ||||||||| |||||| <<<<<<<<< 052328725 AGTTCCTTGTCAAG-AAGAGTGAGTGAGTCCACCTCACCTTCAAGATGACCACCAGCAGG 052328783 104670774 CCAGCGCTCACAAGAAGAGGAATGAGGCTACTAATGAACCAGCTAAACCAGAGGATGCTG 104670833 >>>>>>>>> |||||||||||||| ||||| |||||||| |||| |||||||||||||||||||||| <<<<<<<<< 052328784 CCAGCGCTCACAAGCAGAGGGATGAGGCTGCTAACAAACCAGCTAAACCAGAGGATGCCA 052328843 104670834 TTGTCCAGGCCCATGATCCGCATGGTCTCTTTCAGCCGTGCCTCCTTCTCATACACGATG 104670893 >>>>>>>>> |||||||| |||||||||||||||||||| |||||||| ||||||||||||||||| ||| <<<<<<<<< 052328844 TTGTCCAGACCCATGATCCGCATGGTCTCCTTCAGCCGAGCCTCCTTCTCATACACAATG 052328903 104670894 CCCTTGATGATCACAGCCACTGAGTAAATCCAGGCCAGCGTCATGAAGAGGGGCATTGAC 104670953 >>>>>>>>> | ||||||||||||||| || ||||| |||||||| || ||||||||||||||||||||| <<<<<<<<< 052328904 CTCTTGATGATCACAGCGACAGAGTAGATCCAGGCTAGAGTCATGAAGAGGGGCATTGAC 052328963 104670954 CGGCTCATCACCCGCAGAAAGCTGGAGGCCCCAAGGAAGGACAAGGGGAGAAAGAAAGAC 104671013 >>>>>>>>> |||||||| ||||||||||| |||||||| | || || | || ||| | || |||| <<<<<<<<< 052328964 CGGCTCATGACCCGCAGAAAACTGGAGGCACAGAGAAAAGGCATGGGAAAAATGAAAAGT 052329023 104671014 ACACGTGAGCCAGGGTGATGGGCCAAGGCCTCTGAGCCTGCATGCTAGAGGGAGCACCAC 104671073 >>>>>>>>> ||||||| || | ||||||||| |||| || |||| ||| | <<<<<<<<< 052329024 ----GTGAGCCCGG-CACCGATCCAAGGCCT-------TGCACACTGGAGGACAAACCTC 052329071 104671074 ATCTGGGCCACAGAAGGACAGGCCCTCTAGACTCTGAAATGTACGTATGATCCAATGCTT 104671133 >>>>>>>>> ||| ||| | | | | | |||||| || ||||| ||||| | | || | || <<<<<<<<< 052329072 ATCAGGGTCGCTTATGAA-AGGCCCACTGAACTCTCAAATG--------ACCAAAGGTTT 052329122 104671134 CACGAGCAATGCAATGTAGAGAGAAAAACGAGGCTAACAAAGTGTTGCCAAACCAAATTT 104671193 >>>>>>>>> || |||| || | ||||| ||| | || | | || | ||| | |||||| <<<<<<<<< 052329123 CATTAGCAGTGGA---CAGAGATGAAACCTGGGTTTCGAGGGTATGGCCGTGCAAAATTT 052329179 104671194 CTTTGGGGGCTTGCTTCAGTAACTAGGTAACTGTGAGCGATAC-TTAAACTAAAGGTAGA 104671252 >>>>>>>>> || |||||| ||| | || ||||| || | || | | |||| |||| || <<<<<<<<< 052329180 TTTCAGGGGCTCTCTTTAATAGCTAGGAAATGGATAGGGTAATATTAAGATAAATATAAG 052329239 104671253 TTATGTTA--AAGTACTAAAAACCAAAACA------AAAAAACAACTCATTCTCTCACAA 104671304 >>>>>>>>> ||| || |||||||||| || || | || ||||| ||| | | | <<<<<<<<< 052329240 TTACTCTACTAAGTACTAAACACAAAGGGCGGGGGCAGAATCCAACTTGGTCTTCCGCTA 052329299

  8. Graphical presentation of sequence conservation as “peaks-and-valley” curve % identity base sequence coordinates VISTA visualization 104637349 GTAGTGCCACTGAGTGTGACAGGGATGGCAAGAAAAGCATTAAGTTCCAAGGGGAAAGAA 104637408 >>>>>>>>> | || ||| ||| |||| |||||||||| | || || |||| | |||||||| <<<<<<<<< 052290302 GAGATGTCACCAAGTA-AACAGAGATGGCAAGAGGACCAATAGGTTCTAGTGGGAAAGAC 052290360 “sliding window” to measure sequence conservation (default window size 100bp) >70% identity

  9. VISTA homepage: http://genome.lbl.gov/vista • Access servers, browsers, other information VISTA Servers (submit your own data) VISTA Browsers (precomputed alignments) Other VISTA-related Projects

  10. wgVISTA mVISTA Align and compare sequences, including microbial assemblies Align and compare sequences rVISTA Search for TFBS combined with a comparative sequence analysis VISTA Servers GenomeVISTA Align DNA sequence to a genome

  11. VISTA-Point VISTA Browser Browse through pre-computed whole-genome alignments Browse and obtain sequence and alignment data Whole Genome rVISTA Whole genome analysis for conserved TFBS over-represented in upstream regions of genes Precomputed Alignments

  12. VISTA Browser: Access

  13. VISTA Browser VISTA tracks on UCSC Browser VISTA Browser: Input Menu • Choose “base” genome • Select location • Determine visualization preference genome position visualization Java 2, if needed VISTA-Point

  14. VISTA Browser: Alignment Details direction exon gene repeats SNPs alignment

  15. VISTA Browser: Result Menu & Icons Control Panel Position on chromosome Graphical display of genome alignments 1 row Cursor Info Color Legend Curve annotation (species)

  16. VISTA Browser: Zooming vs. rhesus vs. dog

  17. VISTA browser

  18. VISTA Point: Access Overview

  19. VISTA Point: Graphics Table

  20. VISTA Point: AlignmentsTable sequence

  21. Google map-like Dot-Plot

  22. BlockView – Synteny Plot tool

  23. Principal components • RegTransBase – experimental data • manually curated database of regulatory interactions captured from literature; • 6000 papers NAR database issue, 2007 • RegPrecise – computational predictions • manually curated database of regulons inferred by comparative genomics approach NAR database issue, 2010; Featured Article • RegPredict – web tool for regulon inference • integrated system for fast and accurate inference of regulons by comparative genomics NAR Web Server issue, 2010; Featured Article

  24. mVISTA: Access

  25. mVISTA: Interface • Our example will show 3 sequences • Align up to 100 sequences

  26. mVISTA: Input of Sequences • Provide your email address • Upload your sequences • Or enter GenBank ID your email upload file or GenBank ID

  27. mVISTA: Input Parameters • Shuffle-LAGAN • multiple pair wise alignments • detects sequence rearrangements and inversions • AVID • multiple pair wise alignments • accepts finished or draft sequences • LAGAN • true multiple alignments

  28. PDF VISTA Browser mVISTA: Results VISTA-Point

  29. wgVISTA: Microbial Assemblies Comparison • wgVISTA: whole genome VISTA • Compares 2 sequences (up to 10 Mb) • Draft or finished microbial assembly sequences can be used

  30. rVISTA: Access

  31. Regulatory VISTA (rVISTA): prediction of transcription factor binding sites Simultaneous searches of the major transcription factor binding site database (Transfac) and the use of global sequence alignment to sieve through the data • rVISTA search is automatically run when submitting: • mVISTA • genomeVISTA

  32. Ikaros-2 Ikaros-2 NFAT Ikaros-2 Human TGATTTCTCGGCAGCAAGGGAGGGCCCCATGACAAAGCCATTTGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCTGTCTCTCCCTTCCCCTCTG Mouse TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCACTCGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCTCTCTCTTCCTCCCCCTCCA Dog TGATTTCTCGGCAGCAAGGGAGGGCCCCATGACGAAGCCATTTGAAATCCCAGAAGCGATTTTCTACCTACGACCTCACTTTCTGTTGCGCTCACTCCCTTCCCCTGCA Rat TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCACTCGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGTTCTCTCTTCCTCCCCCTCCA Cow TGATTTCTCGGCAGCCAGGGAGGGCCCCATGACGAAGCCATTTGAAATCCCAGAAGCAATTTTCTACTTACGACCTCACTTTCTGTTGCGTTCTCTCCCTTCCCCTCCT Rabbit TGATTTCTCGGCAGCCAGGGAGGGCCCCACGAC-AAGCCATTCAAAATCCCAGAAGTGATTTTCTACTTACGACCTCACTTTCTGTTG----CTCTCTCCTTCCCTCCA 20 bp dynamic shifting window >80% ID Regulatory VISTA (rVISTA): 1. Identify potential transcription factor binding sites for each sequence using library of matrices (TRANSFAC) 2. Identify aligned sites using VISTA 3. Identify conserved sites using dynamic shifting window

  33. rVISTA: Interface • rVISTA sequence submission: set number • Submit email address, sequences, and set parameters • Key step: click the box for: Find potential transcription factors your email sequences

  34. rVISTA: Select TRANSFAC Matrices

  35. rVISTA: Mailed Results • Emailed results will provide a link • Choose which binding sites matrices to display • You can then choose visualization options display

  36. rVISTA: Results Graphic sites • Blue all transcription factor (TF) binding sites • Red TF sites which are aligned in both sequences • Green TF sites which are aligned & in conserved regions sequences

  37. Whole Genome rVISTA: Access

  38. Whole Genome rVISTA: Select Alignment upstream range IDs or symbols

  39. Whole Genome rVISTA: Results sites found view genes

  40. Examples of VISTA usage • Non-coding regulatory regions, for example enhancers • Genes from the same gene families • Alternative splicing • Transcriptional regulation • Genetic studies References collected are available through the Publications link at the VISTA home page http://genome.lbl.gov/vista

  41. VISTA-related Publications

  42. http:/www.openhelix.com

  43. VISTA thanks Biology Genomics Division, LBNL lead by Dr. Edward Rubin Dario Boffelli Kelly Frazer Gaby Loots Len Pennacchio Marcelo Nobrega Axel Visel Bioinformatics Michael Brudno Olivier Couronne Simon Minovitsky Igor Ratner Alexander Poliakov Lior Pachter (UCB) Shyam Prabhakar Dmitriy Ryaboy Nameeta Shah Inna Dubchak

More Related