310 likes | 414 Views
Purvi Saraiya Chris North Dept. of Computer Science Virginia Polytechnic Institute and State University. Karen Duca Virginia Bioinformatics Institute Virginia Polytechnic Institute and State University. An Evaluation of Microarray Visualization Tools for Biological Insight.
E N D
Purvi SaraiyaChris North Dept. of Computer ScienceVirginia Polytechnic Institute and State University Karen Duca Virginia Bioinformatics InstituteVirginia Polytechnic Institute and State University An Evaluation of Microarray Visualization Tools for Biological Insight Presented byTugrul Ince and Nir Peer University of Maryland
Goals • Evaluate five popular visualization tools • Cluster/Treeview • TimeSearcher • Hierarchical Clustering Explorer (HCE) • Spotfire • GeneSpring • Do so in the context of bioinformatics data exploration
Goals • Research Questions • How successful are these tools in stimulating insight? • How do various visualization techniques affect the users’ perception of data? • How does users’ background affect the tool usage? • How do these tools support hypothesis generation? • Can insight be measured in a controlled experiment?
Visualization Evaluations • Typically evaluations consist of • controlled measurements of user performance and accuracy on predetermined tasks • We are looking for an evaluation that better simulates a bioinformatics data analysis scenario • We use a protocol the focuses on • recognition and quantification of insights gained from actual exploratory use of visualizations
Insights • Hard to define what is an “insight” • We need this term to be quantifiable and reproducible • Solution • Encourage users to think aloud • and report any findings they have about the dataset • Videotape a session to capture and characterize individual insights as they occur • generally provides more information than subjective measures from post-experiment surveys
Insights • Define insight as • an individual observation about the data by the participant • a unit of discovery • Essentially, any data observation made during the think aloud protocol • Now we can quantify some characteristics of each insight
Insight Characteristics • Observation • The actual finding about the data • Time • The amount of time taken to reach the insight • Domain Value • The significance of the insight. Coded by a domain expert. • Hypotheses • Hypothesis and direction of research • Directed vs. Unexpected • Recall: participants are asked to identify questions they want to explore • Correctness • Breadth vs. Depth
Insight Characteristics • Category • Overview – overall distributions of gene expression • Patterns – identification or comparison across data attributes • Groups – identification or comparison of groups of genes • Details – focused information about specific genes
Experiment Design • A 35 between-subjects design • between-subjects different subjects for each pair • Dataset: 3 treatments • Visualization tool: 5 treatments
Experiment Design • Participants • 2 participants per dataset per tool • Have at least a Bachelor’s degree in a biological field • Assigned to tools they had never worked with before • to prevent advantage • measure learning time • Categories • 10 Domain Experts • Senior researchers with extensive experience in microarray experiments and microarray data analysis • 11 Domain Novices • Lab technicians or graduate student research assistants • 9 Software Developers • Professionals who implement microarray software tools
Protocol and Measures • Chose new users with only minimal tool training • Success in the initial usage period is critical for the tool’s adoption by biologists • Participants received an initial training • Background description about the dataset • 15-minute tool tutorial • Participants listed some analysis questions • Instructed to examine the data with the tool as long as needed • They were allowed to ask for help about the tool • Simulates training by colleagues
Protocol and Measures • Every 15 minutes, participants estimated percent of total potential insight they obtained so far • Finally, assessed overall experience with the tools during session • Entire session was videotaped for later analysis • Later, all individual occurrences of insights were identified and codified
Cluster/TreeView = ClusterView • Cluster • to cluster data • TreeView • Visualize the clusters • Uses heat-maps
TimeSearcher 1 • Parallel Coordinate Visualization • Interactive Filtering • Line Graphs for each data entity
HCE • Clusters data • Several Visualizations • Heat-Maps • Parallel Coordinates • Scatter Plots • Histograms • Brushing and Linking
Spotfire • General Purpose Visualization Tool • Several Displays • Scatter Plots • Bar Graphs • Histograms • Pie/Line Charts • Others… • Dynamic Query Sliders • Brushing and Linking
GeneSpring • Suitable for Microarray data analysis • Shows physical positions on genomes • Array layouts • Pathways • Gene-to-gene comparison • Brushing and Linking • Clustering capability
Number of Insights ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • Spotfire: Highest number of insights • HCE: poorest
Total Domain Value ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • Spotfire: Highest insight value • HCE, GeneSpring: poorer
Avg. Final Amount Learned ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • Spotfire: high value in learning • ClusterView and HCE are poor
Avg. Time to First Insight ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • ClusterView: very short time to first insight • TimeSearcher 1 and Spotfire are also quick
Avg. Total Time ClusterView TimeSearcher 1 HCE Spotfire GeneSpring • Total time users spent using the tool • Low Values: Efficient or Not useful for insight
Unexpected Insights • HCE revealed several unexpected results • ClusterView provided a few • TimeSearcher 1 for time series data • Spotfire contributed to 2 unexpected insights Hypotheses • A few insights led to hypotheses • Spotfire 3 • ClusterView 2 • TimeSearcher 1 1 • HCE 1
Insight Categories • Overall Gene Expression • Overview of genes in general • Expression Patterns • Searching patterns is critical • Clustering is useful • Grouping • Some users wanted to group genes • GeneSpring enables grouping • Detail Information • Users want detailed information about genes that are familiar to them
Visual Representations and Interactions • Although some tools have many visualization techniques, users tend to use only a few • Spotfire users preferred heat-maps • GeneSpring users preferred parallel coordinates • Lupus dataset: visualized best with heat-maps • Most users preferred outputs of clustering algorithms • HCE not useful when a particular column arrangement is useful
Running out of time, So, wrap up • Use a Visualization tool (that’s why we’re here!) • Spotfire: best general performance • GeneSpring: Hard to use • Dataset dictates best tool! • Time Series data: TimeSearcher • Others: Spotfire, GeneSpring? • Interaction is the key • Grouping and Clustering are necessary features
Critique • In all fairness, measuring insights is really hard! Here are some possible issues • Subjectivity • Experiment relies on users always thinking aloud • Also, depends on a domain expert to evaluate insights • Results may vary widely based on participants expertise (only two per tool-dataset pair) • Some insight characteristics are inherently subjective • Domain Value • Breadth vs. Depth
Critique • How do one count insights? • Assumes honest reporting by participants • Some insights may be of no great value • What if a discovery just reaffirms a known fact? Is that an insight? • Measuring time taken to reach an insight • Maybe instead of measuring from beginning of session we should measure from last insight