1 / 51

Visual Analytics and Biological Information

Visual Analytics and Biological Information. Chris Shaw School of Interactive Arts & Technology, Simon Fraser University. ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT] | WWW.SIAT.SFU.CA.

margo
Download Presentation

Visual Analytics and Biological Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Analytics and Biological Information Chris Shaw School of Interactive Arts & Technology, Simon Fraser University ______________________________________________________________________________________SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT] | WWW.SIAT.SFU.CA

  2. Visual Analytics: Integrated Interdisciplinary R&D Cognitive Science Information Systems Visual Analytics Graphic & Interaction Design Mathematical & Statistical Methods

  3. Interdisciplinary Know-how • SFU School of Interactive Arts & Tech • Design focus • Technology and Science • Cross-disciplinary Ph.D, MSc. BSc. • UBC Media & Graphics Interdisciplinary Centre (MAGIC) • 15 years use-inspired basic research,Co-development with industry & government

  4. People IMAS: Interactive Multigenomic Analysis System

  5. Visual Analytics • Two broad task domains: • Analysis of large datasets • Overall interaction time: hours to weeks • EG. the VAST Contest: Find the threat represented in this large collection of text, images, emails… • Monitoring and emergency response • Overall interaction time: seconds to minutes • EG. Airport security screening: Find the smuggled weapon

  6. NSERC Strategic Grant: “Visual Analytics for Safety & Security” • Application-aware basic research on human-computer cognitive systems • Perceptual & spatial cognition stream • Sensemaking, levels/types of user expertise • New cognitive collaboration & coordination stream • Application development stream • Goal is to co-develop human & tech aspects • Understanding users, system customization and training is as important as new technology

  7. New Research Methods • Conventional qualitative and quantitative methods (e.g. grounded theory, stats) • Add advanced statistical, computational and math models • Integrated mixed-methods analysis • Calls for visual analytics tools for visual analytics research-- we are our testbed

  8. Goal is integrated Visual Analytics R&D improved understanding improved technology pure basic research use-inspired basic research applied research and development existing understanding existing technology

  9. Interaction Science Design for key human information processing systems Walkthrough or experiment Assess specific aspects of interaction Implement prototype Science in the development process

  10. Steps From Research to Practice • VA science gives us design know-how • VA-aware designers work with user community to apply principles in design • VA users trained to get maximum advantage from VA • VA-sophisticated organizations work with designers to co-evolve new technology and work practices

  11. Biological Sequence Analysis • Visual Analytics in the domain of Biology • Large data • Non-spatial • Many different layers of abstraction

  12. Biological Sequence Analysis • DNA Sequencing projects • Visualization Systems • IMAS • Zoomable Sequence visualization • Gene Finding • BLAST Pairwise alignments • Multialignments • Results

  13. DNA Sequencing Projects Similar sequence yields similar structure Similar structure yields similar function

  14. IMAS supports • Initial stages of analyzing DNA sequence: • Find genes • Find and Analyze similar genes • Multialign like genes to find active sites • Pipeline structure

  15. Existing Tools • Typically web-based • Copy and paste sequence into text entry box • Await search or analysis on remote database • Get an isolated report that the user must organize • Visualization often done as a reporting function • UCSC Genome browser, LLNL ECR browser, NCBI annotation viewer

  16. Desktop Workbenches • Local sequence data • Mix of local and remote analyses • Web queries to remote data • Bluejay, Apollo, Vector NTI, CLC Workbench • User must work to integrate analyses • Workbench is point of collection

  17. IMAS Integrates analysis and display Horizontally Zoomable along sequence Selectable detail vertical Maintains a sequence analysis data collection Visual display aligned to sequence

  18. IMAS Screenshot

  19. IMAS Contents • DNA Sequence (Nucleotide, or NT Seq) • GC % plot • 3 forward & 3 Reverse complement Amino Acid sequences:

  20. Genes Built-in access to Glimmer 3.02 gene finder The labelled boxes are anchors for sequence analysis Segments of DNA can also be marked as a Feature for further analysis

  21. Analyses • Rricke104 gene has • 2 NT BLAST pairs • 1 AA BLAST pair • 1 NT multialignment • 1 AA multialignment

  22. Pairwise Sequence Analysis Activated by selecting a Gene/Feature and selecting NT or AA similarity search NCBI’s BLAST is called to search local databases of NT or AA sequences Can also search NCBI central database

  23. BLAST Alignments • High Scoring Pairs are stacked from most to least significant score • Detail shown when zoomed in • Pair similarity is shown using background color • Darker blue indicates higher similarity • When zoomed out, text is hidden and only similarity is shown

  24. Multialignments Originating Gene BLAST Results • Select BLAST alignments to be multialigned • Clustal-W performs multialignment • Aligns • The originating IMAS gene sequence • The “Full” sequence found by BLAST • Not just the high-quality section • Useful to align entire genes, or entire corresponding segments of DNA

  25. IMAS: Interactive Multigenomic Analysis System Oct 30, 2007 25

  26. Results • Analyzed Orientia Tsutsugamushi (Scrub Typhus) • Found not much similarity in NT sequence • Found a large number of SMART domains not found in the related Rickettsia organisms • IMAS Benefit was data organization

  27. Discussion • Visualization Problems • Pair alignments need better organization • Local visibility and organization needed • Overlap in X causes stacking layout problems • Need selective relaxation of vertical alignment rule

  28. Discussion • Analysis Problems • More flexible access to tools: Restriction enzyme sites, methylation sites, Motifs, Primers, Transcription regulation, Intergenic signals...... • Database mediation problem: Please use XML! • More flexible manipulation of sequence parts • Right now IMAS is somewhat rigid in its worldview

  29. Multiple Genomes • Lots of organisms now sequenced: • Learn from individual similarities • Learn from similar gene organization • Co-location “Synteny” of genes helps infer similar function: • Located together -> expressed together

  30. Synteny Visualization • Line up the similar organisms below primary organism • Draw links to connect them • Take care to manage visual salience

  31. Synteny Visualization Final

  32. IMAS Synteny • Not so good with reversals:

  33. Alternative: Spring Synteny • Orthologs as a node-link diagram • 2 Link types • Neighbors on same organism • Sequence alignment (orthologous) links

  34. Alignment Links • Percent Identity Plot along sequence • Framed to show PIP range • RRickettsia linked to RConorii, RProwazekii, RTyphi, RAkari

  35. Springs • Primary organism is central spine • Secondary sequence have parallel track connected by similarity links • Each secondary sequence has its own resting length for similarity links • Length of neighbor links is blend of • NT coordinate difference • ln(length) * ln(length)

  36. Neighbor links • Using NT distance gives network shapes with many acute angles • Directly displays relative lengths of genomes

  37. Rrickettsii Genomes Genomic Spring-Synteny Visualization with IMAS

  38. Results • Advantages: • Shows reversals clearly • Shows gene “splits” with respect to primary genome • Shows insertions/deletions • Disadvantages • Obscures length relationships • Force-directed layout requires fiddling • Rotating the similarity edge makes comparing similarity difficult

  39. Results • Trade-off: • Free 2 dimensions for gene placement • Get to locate similar items close to each other • Get ability to see gross rearrangements • Lose ability to see detailed similarity along DNA sequence • Lose geometric location information • Lose regulatory info (not represented)

  40. IMAS • Supports annotation pipeline • Tree or DAG visualization, where • Branches are individual BLAST runs • Branches converge on multialignments • Biologists want more! • Analyze arbitrary collections of sequence

  41. More • Want ability to interactively cut, edit, and analyze sequence • “Genomic Spreadsheet” where • Manage Sequences • Compare & Align sequences • Search for similar sequences • Manage sequences at levels of abstraction higher than sequence + annotation text

  42. CzSaw • A Visual Analytics System for Text Data • Built by the SIAT CzSaw group • Victor Chen, Dustin Dunsmuir, Nazanin Kadivar, Eric Lee, Cheryl Qian • John Dill, Chris Shaw, Rob Woodbury

  43. Exploring Data Data Analysis Process Data Visualizing Analysis Model Capturing Analysis Process Analysis Model Analysis History

  44. CzSaw Data Views Exploring Data Script Visualizing Analysis Model Capturing Analysis Process Analysis History Dependency Graph History View

  45. Data Views Script Dependency Graph History View

  46. Script Dependency Graph History View

  47. Dependency Graph History View

  48. History View

  49. Script History View

  50. Conclusions • Building IMAS helped us discover that IMAS is not yet what you want • Supports pipeline • Need to analyze with respect to many data types • Genome & other ontologies • Phylogeny • Metabolic networks • Regulatory networks

More Related