900 likes | 931 Views
Explore the use of word clouds in language and text analysis, and the challenges they pose in conveying meaningful information. Discover how semantically grouped word cloud designs can improve understanding while retaining engagement.
E N D
The Intersection of Language, Algorithms, and Design Marti Hearst, UC Berkeley UCSD Design@Large October 31, 2018
Measure what they can, not what they should “Good” Model: baseball stats Do not adjust agilely to error WMD: US News College Rankings Not answerable, secret formula Create their own distorted reality
How often do we have the designs we want versus those our algorithms can (easily) make?
Tag Cloud Order: Surprise! • 7 interviewees DID NOT REALIZE alphabetical ordering • “What order are tags shown in?” • hadn’t thought about it • don’t think about tag clouds that way • random order • ordered by semantic similarity • This result was also found by Wattenberg et al. 2008
Main Reasons For Using: • To signal the presence of tags on the site • An inviting way to get people interacting with the site • A good way to get the gist of the site • Easy to implement
New Perspective: Tag Clouds are Social! • It’s not about the “information”! • Self-reflection • Showing off topics to others, socially. • Probably a fad.
Word Size Variations in Word Clouds are Problematic:Even with few words in the cloud, the relative values are difficult to perceive. Jonathan Schwabish, http://www.allanalytics.com/author.asp?section_id=3072
Answer: Hamlet’s famous “to be or not to be” soliloquy. But you couldn’t tell. Why not?
“The commonly used trick of scaling by the square root of the word’s weight (to compensate for the fact that words have area, not just length) simply makes a Wordle look boring.” “There’s not much evidence that [tag clouds are] all that useful for navigation or other interactive tasks. … Once I decided to build a system for viewing text rather than tags, it seemed superfluous to have the words do anything other than merely exist on the page. I decided I would design something primarily for pleasure.” “Color means absolutely nothing in Wordle.” it is used for contrast and aesthetics. Feinberg on Wordles Some of Wordle’s success is due to its “its one-paste /one-click instant gratification.” Feinberg, Ch 3, Beautiful Visualization, 2010
Feinberg, Wattenberg, and Viegas 2009 surveyed 4,306 Wordle users and found: • 50% did not understand what font size indicated • 57% wrote the text they visualized • Color “often” interpreted as having meaning • Other Studies find: • Varying font size detrimental to understanding statistics • Font size can guide visual search for certain tasks, but users prefer search boxes for word lookup tasks • Column layouts or bar charts are better for recognizing frequencies of values Other Studies
Why Are They Used Generally? • Word clouds are easy to make. • Word clouds are visuallyengaging. • Word clouds are commonly used.
Word Clouds continue to be used as evidence in scientific settings. Why This Matters
Presented at Vis 2018 The word cloud “shows a summary of tweets” Urban Space Explorer: A Visual Analytics System for Urban Planning , Karduni et al., IEEE CG&A 2018
Presented at Vis 2018 “We see this distribution covers a variety of ethnic surnames, perhaps giving insight into how immigrants migrated after coming to Ellis Island.” Name Profiler Toolkit, Wang et al., IEEE CG&A 2018
Presented at ACL 2018 for 28 seconds. “Here we find differences among the words in large letters. We find for example, learning networks and embeddings being heavily represented in ACL 2018 titles.”
We wouldn’t plot numerical axes incorrectly. Why is it ok to show text in this way? Why?
Why Are They Used in Science? • Word clouds are easy • Word clouds are visually engaging. • Word clouds are commonly used. • There is no alternative with the same properties. • Training in usability is generally lacking. • Also …
Almost any text outcome can look ok: People are great at making up associations among words. It’s hard to conjure what isn’t there: People are really bad at noticing what is missing from text collections. www.randomlists.com
These are accurate, but do not make the words as prominent. An approximate alternative …
New work Goal: retain the engaging aspect of word clouds, while imparting some useful semantic information. Hearst et al, An Evaluation of Semantically Grouped Word Cloud Designs, under review, TGCV
Organizing the words both semanticallyand visuallywill improve understanding while retaining engagement. Hypothesis
Evaluating Word Clouds • Most papers are vague about this • “gist”, “summary”, “navigate”, “see trends” • Most evaluations do not assess these; instead: • Identify the largest word • Identify a given word • How to evaluate more deeply?
Given a set of words, identify the category A New Task menu waiter dishes tablecloth bill restaurant
Hypothesis: standard wordle worst Color + space best Mixed color, but with coherent color assignments, falls in between. Results were consistent with hypothesis. 88% preferred the column layout for task. Average score (out of 5)
White Space Separation Color Mapped, Spatial Jumble Color Mapped, Spatial Organized All views had larger font size variation than prior study
Hypothesis was that column layout would outperform Spatial Organized. Column layout scored best; S.O. significantly better than Spatial Jumbled (Wordle) but not significantly different from Column. Average score (out of 5) 90% preferred color column for “task” 56% preferred color column for “visually pleasing” With the rest split between the other two.