470 likes | 755 Views
EVALUATING VISUALIZATIONS : USING A TAXONOMIC GUIDE By E. MORSET, M. LEWIS & K. A. OLSEN PRESENTORS CHANNA P. WITANA C ALVIN O R CONTENT Introduction Visual and domain tasks Methodology Tasks Results Discussion Conclusion INTRODUCTION Previous Papers published;
E N D
EVALUATING VISUALIZATIONS : USING A TAXONOMIC GUIDE By • E. MORSET, M. LEWIS & K. A. OLSEN • PRESENTORS • CHANNA P. WITANA • CALVIN OR
CONTENT • Introduction • Visual and domain tasks • Methodology • Tasks • Results • Discussion • Conclusion
INTRODUCTION • Previous Papers published; • Morse, E. & Lewis, M. (1997). Why information retrieval visualizations sometimes fail, in Proceeding of the 1997 IEEE International Conference on Systems, Man, and Cybernetics, Oct. 12-15, Orlando, FL • Morse, E., Lewis, M., Korfhage, R., and Olsen, K. (1998). Evaluation of text, numeric and graphical presentations for information retrieval interfaces: User preference and task performance measures.Proceedings of the 1998 IEEE International Conference on Systems, Man, and Cybernetics, Oct 12-14, San Diego, CA, 1026-1031
Information Retrieval Visualization Systems • Bead (Chalmers, 1996) • InfoCrystal (Spoerri, 1993) • BIRD • GUIDO • VIBE These have been developed as visual information exploration tools to aid in retrieval tasks.
In TILE BARS (Hearts, 1995) • Paragraphs on X-axis • Query items on Y-axis • Each query term tile is shaded according to how well the paragraph matches the query term. • By glancing the Tile Bar a user can see which query terms match, most relevant sections, distribution and coincidence of topics throughout the document.
In VIBE • VIBE represents query terms as moveable circles with documents as variously sized rectangles suspended between them
Visualization Type System Word Ordered text such as search engine output Icon list Tile Bars, Cougar Graph (Cartesian) GUIDO, BIRD, InfoCrystal, Component State Spring (physical analogue) VIBE VISUAL AND DOMAIN TASKS • Basic Forms • Map Systems • Dimensions & Reference Point Systems • Visualization Types
Task Classification of WEHREND & LEWIS • Locate • Identify • Distinguish • Categorize • Cluster • Distribution • Rank • Compare between relations • Associate • Correlate
Implication Type Subtype Elemental tasks Organization Visual grouping Proximity associate, cluster, locate Similarity categorize, cluster, distinguish Continuity associate, locate, reveal Closure cluster, locate, outline Visual attention cluster, distinguish, emphasize, locate Visual sequence emphasize, identify, rank Visual composition associate, correlate, identify, reveal Signaling Structuring tabulate, plot, structure, trace, map Encoding label, symbolize, portray, quantify Transformation Modification emphasize, generalize, reveal Transition switch ZHOU & FEINER Visual Task Taxonomy
METHODOLOGY • Dependent Variables • Number of correct answers. • Time to completion of a task set. • Independent Variables • Display Type • Order of Presentation • Individual Task • Scenario Difficulty • 195 subjects undertook the study using web • 2 term or 3 term test randomly
Visual displays Specific IR subtasks Taxonomic categories Generalized questions Generating Experimental Tasks • Sample as broadly as possible rather than deeply • Select tasks whose parameter lists varied significantly
2-Term Test 2.1 Are there more documents that contain ONLY the term Romania or ONLY the term Czechoslovakia? 2.2 Which is the most frequent key term in this set of documents? A. Oil; B. York 2.3 One of the documents is unlike any of the others. Can you identify it? Place the document number in the text box. 2.4 Rank documents A, B, and C with respect to the amount of term ‘Soviet’ that they contain 2.5 Which of the following documents are most similar with respect to the relative amount of the key terms? 2.6 What of the following statements is true? A. There are no documents that contain roughly equal amounts for the two terms. B.If a document talks about ‘Oil’ then it also talks about ‘Texas’. C.‘Texas’ and ‘Oil’ are not very highly related. D.A and C E.All of the above 2.7 Location
3-Term Test 3-1.Are there more documents that contain ONLY the term ‘earthquake’ or ONLY the term ‘California’ or ONLY the term ‘death’? 3-2.Which is the most frequent key term in this set of documents? A. Vatican; B. Embassy; C. Noriega 3-3.One of the documents is unlike any of the others. Can you identify it? Place the document number in the text box. 3-4.Rank documents A, B, and C with respect to the amount of term ‘Company’ that they contain. 3-5.Which of the following documents are most similar with respect to the relative amount of the key terms? 3-6.Which of the following statements is true? A. At least one document contains all three terms. B. At least one document contains the terms ‘Arab’ and ‘bomb’. C. ‘Vatican’ and ‘Arab’ are not very highly related. D. B and C E. All of the above. 3-7.Location
Evaluating visualizations: using a taxonomic guide Results – subjects 1. Subjects No significant differences between the studies for any of these variables. Mean age in the 2- and 3-term studies was 23.2 and 23.6 years. The results show that the skill level of subjects of the 2- and 3-term groups were no significant differences. 1.Gender 2.Current educational level 3.Native language • No significant differences between the studies for any of these variables. • The results show that the skill level of subjects of the 2- and 3-term groups were no significant differences.
Evaluating visualizations: using a taxonomic guide Results – time to completion 2. For the 2-term study • Significant differences among the display types with respect to completion time (p<0.001).
Evaluating visualizations: using a taxonomic guide Results – time to completion 2. For the 2-term study • Significant differences among the display types with respect to completion time (p<0.001). • Using “spring” as pivot case, all of the other display types are shown to take a significantly longer time in order to complete the task.
Evaluating visualizations: using a taxonomic guide Results – time to completion 2. For the 3-term study • The ANOVA shows that the four displays were significantly different (p<0.001). • Using “spring” as the pivot case, the completion time is highly different from each of the other displays. Within-subjects contrasts for 3-term display
Evaluating visualizations: using a taxonomic guide Results – time to completion 2. Analysis by pair-wise contrasts • The “word” and “table” displays were roughly equivalent in terms of speed of performance. • The “icon” display was faster. • The “spring” display was fastest.
Evaluating visualizations: using a taxonomic guide Results – time to completion 2. Comparison across study types (2- and 3-term) • Between-subjects factor. Effect of display type on time to complete task set: 2-term vs. 3-term
Evaluating visualizations: using a taxonomic guide Results – time to completion 2. Comparison across study types (2- and 3-term) • Between-subjects factor. • For the “word”, “icon”, and “table” displays, the subjects required more time in the 3-term conditions in order to complete the tasks than the corresponding 2-term conditions. Effect of display type on time to complete task set: 2-term vs. 3-term
Evaluating visualizations: using a taxonomic guide Results – time to completion 2. Comparison across study types (2- and 3-term) • Between-subjects factor. • For the “word”, “icon”, and “table” displays, the subjects required more time in the 3-term conditions in order to complete the tasks than the corresponding 2-term conditions. • The “spring” display did not achieve significance (p=0.086). Effect of display type on time to complete task set: 2-term vs. 3-term
Evaluating visualizations: using a taxonomic guide Results – correctness of answers 3. Correctness of answers • Second method of assessing performance.
Evaluating visualizations: using a taxonomic guide Results – correctness of answers 3. Correctness of answers • Second method of assessing performance. • “Word” display shows a lower number of correct answers than the other displays (pair-wise comparisons all p<0.001).
Evaluating visualizations: using a taxonomic guide Results – correctness of answers 3. Correctness of answers • Second method of assessing performance. • “Word” display shows a lower number of correct answers than the other displays (pair-wise comparisons all p<0.001). • No significant differences in number of correct answers between the 2-term and 3-term studies.
Evaluating visualizations: using a taxonomic guide Results – order of presentation 4. Order effect (time performance) • The order of presentation of the display type was randomized. • Poorer performance when the display was presented first in the series. • Progressive decreases in the time of the subsequent trials. 2-term study 3-term study
Evaluating visualizations: using a taxonomic guide Results – order of presentation 4. Order effect (correctness of answers)
Order effect Number of correct answers Evaluating visualizations: using a taxonomic guide Results – order of presentation 4. Order effect (correctness of answers) • There was no significant effect of the presentation order on performance as measured by the correctness of answers.
Evaluating visualizations: using a taxonomic guide 5. Performance with respect to task types • “Associate”, “identify” and “rank” task … were performed in very short time periods and … associated with a very high fraction of correct answers. Results – performance with respect to task types
Evaluating visualizations: using a taxonomic guide 5. Performance with respect to task types • “Associate”, “identify” and “rank” task … were performed in very short time periods and … associated with a very high fraction of correct answers. • “Cluster”, “locate”, and some of the compare tasks … were took significantly longer to perform and … have high fraction of error. Results – performance with respect to task types
Evaluating visualizations: using a taxonomic guide Results – preferences Results – preferences 6. Preferences (for both the 2- and 3-term studies)
Evaluating visualizations: using a taxonomic guide Results – preferences Results – preferences 6. Preferences (for both the 2- and 3-term studies) analysis showed that no relationship between time completion and preferences. • analysis showed that no relationship between time completion and preferences.
Evaluating visualizations: using a taxonomic guide Results – preferences Results – preferences 6. Preferences (for both the 2- and 3-term studies) analysis showed that no relationship between time completion and preferences. • However, there was a correlation between correctness and preferences. • analysis showed that no relationship between time completion and preferences.
Evaluating visualizations: using a taxonomic guide Results – preferences Results – preferences 6. Preferences (for both the 2- and 3-term studies) analysis showed that no relationship between time completion and preferences. However, there was a correlation between correctness and preferences. • In the non-parametric analysis, … no correlation between the position in which any display was seen and any positional ranking assigned by the subjects. analysis showed that no relationship between time completion and preferences. • However, there was a correlation between correctness and preferences. • analysis showed that no relationship between time completion and preferences.
Evaluating visualizations: using a taxonomic guide Discussion • The “word” and “text” displays were always associated with poor time performance. (preliminary studies reported earlier) • “Spring” display is superior in producing quick responses. • A visual taxonomy promises to be a useful guide for developing visual interfaces in general and IR interfaces in particular.
Evaluating visualizations: using a taxonomic guide Conclusion • Based on the technique of back-to-basics strategy, the visualization techniques themselves were tested, but not the systems. • The studies show that the “spring” and “icon” displays can provide an efficient and effective way to present information. • The technique of asking questions could be redesigned in order to improve the Internal validity. ---- END ----
Evaluating visualizations: using a taxonomic guide Discussion
Evaluating visualizations: using a taxonomic guide Results – order of presentation 4. Order effect (time performance) • Statistical analysis show that the first point was different from the others. 2-term study 3-term study
Evaluating visualizations: using a taxonomic guide Results – order of presentation 4. Order effect (time performance) • Statistical analysis show that the first point was different from the others. • However, the subsequent presentations were not different from each other. 2-term study 3-term study
Evaluating visualizations: using a taxonomic guide Results – order of presentation 4. Order effect (time performance) • The slopes of the lines are initially steeper. • The “spring” display appears to be more flattened than the other curves. 2-term study 3-term study
Order effect Number of correct answers Evaluating visualizations: using a taxonomic guide Results – order of presentation 4. Order effect (correctness of answers) • There was no significant effect of the presentation order on performance as measured by the correctness of answers. • “Spring” display is the only display that is not influenced by the increased complexity of the 3-term conditions.
Evaluating visualizations: using a taxonomic guide 5. Paired contrasts Results – performance with respect to task types
Evaluating visualizations: using a taxonomic guide 5. Paired contrasts Results – performance with respect to task types • For paired contrasts, using first question (compare) as the pivot group …
Evaluating visualizations: using a taxonomic guide 5. Paired contrasts Results – performance with respect to task types • For paired contrasts, using first question (compare) as the pivot group … • … both performance measures (completion time and correct answers) showed a significant difference for each pair of values … Completion time Correctness
Evaluating visualizations: using a taxonomic guide 5. Paired contrasts Results – performance with respect to task types • For paired contrasts, using first question (compare) as the pivot group … • … both performance measures (completion time and correct answers) showed a significant difference for each pair of values … • EXCEPT … 1.for the “Distinguish” question for time and Completion time
Evaluating visualizations: using a taxonomic guide 5. Paired contrasts Results – performance with respect to task types • For paired contrasts, using first question (compare) as the pivot group … • … both performance measures (completion time and correct answers) showed a significant difference for each pair of values … • EXCEPT … 1.for the “Distinguish” question for time and 2.for the “Rank”question for correctness. Correctness