1 / 16

Document Collections 2

Document Collections 2. cs5984: Information Visualization Chris North. Approaches. Clustering (last time) Themescapes, … Network Keyword. Clustering With Full text Galaxy of News pg 452. Clustering. Good: Map of collection Major themes and sizes Relationships between themes

capuano
Download Presentation

Document Collections 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Document Collections 2 cs5984: Information Visualization Chris North

  2. Approaches • Clustering (last time) • Themescapes, … • Network • Keyword

  3. Clustering With Full text Galaxy of News pg 452

  4. Clustering • Good: • Map of collection • Major themes and sizes • Relationships between themes • Scales up • Bad: • Where to locate documents with multiple themes? • Both mountains, between mountains, …? • Relationships between documents, within documents? • Algorithm becomes (too) critical

  5. Network • Show inter-relationships • Matrix or Complete Graph • Similarity measure between all pairs of docs • Threshold level • Salton, pg 413

  6. Variations Docs + Paragraphs Themes

  7. Network • Better for smaller, more detailed map • Scale up: Network visualization • Good: • Can see more complex relationships between/within documents • Can act like hyperlinks! • Bad: • Finding specific documents • Scale up difficult

  8. Combination: Thinkmap • http://www.thinkmap.com/article.cfm?articleID=38

  9. Keyword • Search engine, keyword query • Rank ordered list • “Information Retrieval”

  10. Today • Hearst, “Tilebars”, web • umer, ashwini

  11. VIBE • Korfhage, http://www.pitt.edu/~korfhage/interfaces.html • Documents located between query keywords using spring model

  12. VR-VIBE

  13. InfoCrystal • Spoerri, pg 140 • Venn Diagram, all possible combinations A&B&C&D A&C&D C&B C

  14. Keyword • Good: • Reduces the browsing space • Map according to user’s interests • Bad: • What keywords do I use? • What about other related documents that don’t use these keywords? • No initial overview • Mega-hit, zero-hit problem

  15. Assignment • Mid-Project status report: due today • Read for Thurs • Fox, “Envision”, web, video • aejaaz, ravi

  16. Upcoming Weeks • I’m at CHI all next week • Tues: Go to VE, SciViz lab: Torg 3050 • Bowman, Kriz, Kelso • Thurs: McCrickard • Read for Tues Apr 10 • DeFanti, “Scientific Visualization”, pg 39 • Sayle, “Rasmol”, web • Yuying, ?

More Related