130 likes | 265 Views
Visualizing textual data. CPSC 601.28 A. Butt / Feb. 26 '09. Overview. Project implications Summarize "Tilebars" Hearst / PARC (Xerox) Summarize "Visualizing the Non-Visual" Wise et al / Pacific Northwest Lab (Battelle) Key Issues Summary References. Project Implications.
E N D
Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09
Overview • Project implications • Summarize "Tilebars" • Hearst / PARC (Xerox) • Summarize "Visualizing the Non-Visual" • Wise et al / Pacific Northwest Lab (Battelle) • Key Issues • Summary • References
Project Implications • Research area is partly based on text-based environmental reports • textual reporting feeds into textual (quasi-judicial) regulatory framework • rooms of binders (e.g. >20,000 pages for Mackenzie Pipeline Project) • Vocabulary specialized / semantically complete • "no significant adverse environmental impacts"
TileBars • goals are to simultaneously view: • length of a document • relative frequency of specific words • distribution of words with respect to each other • benefits include: • enhanced relevancy of search response • patterns of frequency by document / author • compactness of information
Tilebars • Visual representation via • rectangular block: size equates to document length • three bars within the block: each corresponds to a query • in each bar tiles indicate location, saturation of tile indicates frequency • 5 articles, 3 search queries • 1st, 2nd, 5th appear compact / relevant • 1st and 2nd appear to have better concurrency • 3rd and 4th potentially less relevant, greater time investment to read
Visualizing the Non-Visual • goals are to: • overcome time constraints in processing textual information • overcome attention constraints; avoid becoming overwhelmed by volume of textual information • benefits include: • escape limitations of traditional text • increase throughput and comprehension of information processing • feedback on text structure to enhance visualization
Visualizing the Non-Visual • Employ a "natural landscape" metaphor • leverage evolutionary psychological adaptations via natural landscapes for representation • galaxy or star-fields ("night sky") • themescapes ("cartographic" or "landscape") • although statistical measures used for clustering, they are not used as directly as in tile bars • self-organizing maps
Galaxies • PNL software development (DOE) • Display is a review of cancer literature • Branched to SPIRE / In-SPIRE for government documents
Themescapes • PNL software development (DOE) • Branched to SPIRE / In-SPIRE for government documents (renamed "Themeview") • Branched into NVAC (National Visual and Analytics Centre) - part of the Homeland Security infrastructure
Themescapes (2.0?) • Branched progeny of themescapes • Used in searching IP / Patents • Subscription service • Failed metaphors??
Key Issues • Vocabulary / semantics - how do you interpret meaning from text statistics? • earlier failures of natural language processing • contingent semantics • Employing metaphors (Zhang 2008) • rely on unusual linkages (versus analogy) to highlight • degree of "unusual-ness" is critical: too much or too little leads to confusion
Summary www.wordle.net
References Marti A. Hearst: TileBars: Visualization of Term Distribution Information in Full Text Information Access. CHI 1995: 59-66 James A. Wise and James J. Thomas and Kelly Pennock and David Lantrip and Marc Pottier and Anne Schur and Vern Crow. Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents. Proc. IEEE Symp. Information Visualization, InfoVis, pp. 51-58, IEEE Computer Soc. Press, 30-31, October 1995. (in text pages 442-450) Jin Zhang. The Implication of Metaphors in Information Retrieval. Visualization in Information Retrieval, Elsevier, 2008. (pages 215-237)