1 / 31

An Evaluation of Microarray Visualization Tools for Biological Insight

Purvi Saraiya Chris North Dept. of Computer Science Virginia Polytechnic Institute and State University. Karen Duca Virginia Bioinformatics Institute Virginia Polytechnic Institute and State University. An Evaluation of Microarray Visualization Tools for Biological Insight.

umay
Download Presentation

An Evaluation of Microarray Visualization Tools for Biological Insight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Purvi SaraiyaChris North Dept. of Computer ScienceVirginia Polytechnic Institute and State University Karen Duca Virginia Bioinformatics InstituteVirginia Polytechnic Institute and State University An Evaluation of Microarray Visualization Tools for Biological Insight Presented byTugrul Ince and Nir Peer University of Maryland

  2. Goals • Evaluate five popular visualization tools • Cluster/Treeview • TimeSearcher • Hierarchical Clustering Explorer (HCE) • Spotfire • GeneSpring • Do so in the context of bioinformatics data exploration

  3. Goals • Research Questions • How successful are these tools in stimulating insight? • How do various visualization techniques affect the users’ perception of data? • How does users’ background affect the tool usage? • How do these tools support hypothesis generation? • Can insight be measured in a controlled experiment?

  4. Visualization Evaluations • Typically evaluations consist of • controlled measurements of user performance and accuracy on predetermined tasks • We are looking for an evaluation that better simulates a bioinformatics data analysis scenario • We use a protocol the focuses on • recognition and quantification of insights gained from actual exploratory use of visualizations

  5. Insights • Hard to define what is an “insight” • We need this term to be quantifiable and reproducible • Solution • Encourage users to think aloud • and report any findings they have about the dataset • Videotape a session to capture and characterize individual insights as they occur • generally provides more information than subjective measures from post-experiment surveys

  6. Insights • Define insight as • an individual observation about the data by the participant • a unit of discovery • Essentially, any data observation made during the think aloud protocol • Now we can quantify some characteristics of each insight

  7. Insight Characteristics • Observation • The actual finding about the data • Time • The amount of time taken to reach the insight • Domain Value • The significance of the insight. Coded by a domain expert. • Hypotheses • Hypothesis and direction of research • Directed vs. Unexpected • Recall: participants are asked to identify questions they want to explore • Correctness • Breadth vs. Depth

  8. Insight Characteristics • Category • Overview – overall distributions of gene expression • Patterns – identification or comparison across data attributes • Groups – identification or comparison of groups of genes • Details – focused information about specific genes

  9. Experiment Design • A 35 between-subjects design • between-subjects  different subjects for each pair • Dataset: 3 treatments • Visualization tool: 5 treatments

  10. Experiment Design • Participants • 2 participants per dataset per tool • Have at least a Bachelor’s degree in a biological field • Assigned to tools they had never worked with before • to prevent advantage • measure learning time • Categories • 10 Domain Experts • Senior researchers with extensive experience in microarray experiments and microarray data analysis • 11 Domain Novices • Lab technicians or graduate student research assistants • 9 Software Developers • Professionals who implement microarray software tools

  11. Protocol and Measures • Chose new users with only minimal tool training • Success in the initial usage period is critical for the tool’s adoption by biologists • Participants received an initial training • Background description about the dataset • 15-minute tool tutorial • Participants listed some analysis questions • Instructed to examine the data with the tool as long as needed • They were allowed to ask for help about the tool • Simulates training by colleagues

  12. Protocol and Measures • Every 15 minutes, participants estimated percent of total potential insight they obtained so far • Finally, assessed overall experience with the tools during session • Entire session was videotaped for later analysis • Later, all individual occurrences of insights were identified and codified

  13. Show me picturesHere are the tools!!!

  14. Cluster/TreeView = ClusterView • Cluster • to cluster data • TreeView • Visualize the clusters • Uses heat-maps

  15. TimeSearcher 1 • Parallel Coordinate Visualization • Interactive Filtering • Line Graphs for each data entity

  16. HCE • Clusters data • Several Visualizations • Heat-Maps • Parallel Coordinates • Scatter Plots • Histograms • Brushing and Linking

  17. Spotfire • General Purpose Visualization Tool • Several Displays • Scatter Plots • Bar Graphs • Histograms • Pie/Line Charts • Others… • Dynamic Query Sliders • Brushing and Linking

  18. GeneSpring • Suitable for Microarray data analysis • Shows physical positions on genomes • Array layouts • Pathways • Gene-to-gene comparison • Brushing and Linking • Clustering capability

  19. Enough about Tools,Tell me the Results!!!

  20. Number of Insights ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • Spotfire: Highest number of insights • HCE: poorest

  21. Total Domain Value ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • Spotfire: Highest insight value • HCE, GeneSpring: poorer

  22. Avg. Final Amount Learned ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • Spotfire: high value in learning • ClusterView and HCE are poor

  23. Avg. Time to First Insight ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • ClusterView: very short time to first insight • TimeSearcher 1 and Spotfire are also quick

  24. Avg. Total Time ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • Total time users spent using the tool • Low Values: Efficient or Not useful for insight

  25. Unexpected Insights • HCE revealed several unexpected results • ClusterView provided a few • TimeSearcher 1 for time series data • Spotfire contributed to 2 unexpected insights Hypotheses • A few insights led to hypotheses • Spotfire  3 • ClusterView  2 • TimeSearcher 1  1 • HCE  1

  26. Tools vs. Datasets

  27. Insight Categories • Overall Gene Expression • Overview of genes in general • Expression Patterns • Searching patterns is critical • Clustering is useful • Grouping • Some users wanted to group genes • GeneSpring enables grouping • Detail Information • Users want detailed information about genes that are familiar to them

  28. Visual Representations and Interactions • Although some tools have many visualization techniques, users tend to use only a few • Spotfire users preferred heat-maps • GeneSpring users preferred parallel coordinates • Lupus dataset: visualized best with heat-maps • Most users preferred outputs of clustering algorithms • HCE not useful when a particular column arrangement is useful

  29. Running out of time, So, wrap up • Use a Visualization tool (that’s why we’re here!) • Spotfire: best general performance • GeneSpring: Hard to use • Dataset dictates best tool! • Time Series data: TimeSearcher • Others: Spotfire, GeneSpring? • Interaction is the key • Grouping and Clustering are necessary features

  30. Critique • In all fairness, measuring insights is really hard! Here are some possible issues • Subjectivity • Experiment relies on users always thinking aloud • Also, depends on a domain expert to evaluate insights • Results may vary widely based on participants expertise (only two per tool-dataset pair) • Some insight characteristics are inherently subjective • Domain Value • Breadth vs. Depth

  31. Critique • How do one count insights? • Assumes honest reporting by participants • Some insights may be of no great value • What if a discovery just reaffirms a known fact? Is that an insight? • Measuring time taken to reach an insight • Maybe instead of measuring from beginning of session we should measure from last insight

More Related