1 / 31

Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical

Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical Clustering. Dissertation Defense Human-Computer Interaction Lab & Dept. of Computer Science Jinwook Seo. Outline. Research Problems Clustering Result Visualization in HCE

krysta
Download Presentation

Information Visualization Design for Multidimensional Data: Integrating the Rank-by-Feature Framework with Hierarchical

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Visualization Design for Multidimensional Data:Integrating the Rank-by-Feature Framework with Hierarchical Clustering Dissertation Defense Human-Computer Interaction Lab & Dept. of Computer Science Jinwook Seo

  2. Outline • Research Problems • Clustering Result Visualization in HCE • GRID Principles • Rank-by-Feature Framework • Evaluation • Case studies • User survey via emails • Contributions and Future work

  3. Exploration of Multidimensional Data • To understand the story that the data tells • To find features in the data set • To generate hypotheses • Lost in multidimensional space • Tools and techniques are available in many areas • Strategy and interface to organize them to guide discovery

  4. Constrained by Conventions User/Researcher Conventional Tools Statistical Methods Data Mining Algorithms Multidimensional Data

  5. Boosting Information Bandwidth User/Researcher Information Visualization Interfaces Statistical Methods Data Mining Algorithms Multidimensional Data

  6. Contributions • Graphics, Ranking, and Interaction for Discovery (GRID) principles • Rank-by-Feature Framework • The design and implementation of the Hierarchical Clustering Explorer (HCE) • Validation through case studies and user surveys

  7. Hierarchical Clustering Explorer:Understanding Clusters Through Interactive Exploration • Overview of the entire clustering resultscompressed overview • The right number of clustersminimum similarity bar • Overall pattern of each cluster (aggregation) detail cutoff bar • Compare two resultsbrushing and linking using pair-tree

  8. HCE History • Document-View Architecture • 72,274 lines of C++ codes, 76 C++ classes • About 2,500 downloads since April 2002 • Commercial license to a biotech company (www.vialactia.com) • Freely downloadable at www.cs.umd.edu/hcil/hce

  9. Goal: Find Interesting Features in Multidimensional Data • Finding clusters, outliers, correlations, gaps, … is difficult in multidimensional data • Cognitive difficulties in >3D • Therefore utilize low-dimensional projections • Perceptual efficiency in 1D and 2D • Orderly process to guide discovery

  10. Do you see anything interesting?

  11. Do you see any interesting feature?

  12. Correlation…What else?

  13. Outliers He Rn

  14. GRID Principles • Graphics, Ranking, and Interaction for Discovery in Multidimensional Data • study 1D • study2D • then find features • ranking guides insight • statistics confirm

  15. Rank-by-Feature Framework • Based on the GRID principles • 1D → 2D • 1D : Histogram + Boxplot • 2D : Scatterplot • Ranking Criteria • statistical methods • data mining algorithms • Graphical Overview • Rapid & Interactive Browsing

  16. Demo A Ranking Example 3138 U.S. counties with 17 attributes Uniformness (entropy) (6.7, 6.1, 4.5, 1.5) Pearson correlation (0.996, 0.31, 0.01, -0.69)

  17. Categorical Variables in RFF • New ranking criteria • Chi-square, ANOVA • Significance and Strength • How strong is a relationship? • How significant is a relationship? • Partitioning and Comparison • partition by a column (categorical variable) • partition by a row (class info for columns) • compare clustering results for partitions

  18. color : Contingency coefficient C size : Chi-square p-value color : Quadracity size : Least-square error

  19. Categorical Variables in RFF • New ranking criteria • Chi-square, ANOVA • Significance and Strength • How strong is a relationship? • How significant is a relationship? • Partitioning and Comparison • partition by a column (categorical variable) • partition by a row (class info for columns) • compare clustering results for partitions

  20. Partitioning and Comparison • Compare two column-clustering results

  21. Partitioning and Comparison • Compare two row-clustering results

  22. Qualitative Evaluation • Case studies • 30-minute weekly meeting for 6 weeks individually • observe how participants use HCE • improve HCE according to their requirements • 1 molecular biologist (Acute lung injuries in mice) • 1 biostatistician (FAMuSS Study data) • 1 meteorologist (Aerosol measurement)

  23. Lessons Learned • Rank-by-Feature Framework • Enables systematic/orderly exploration • Prevents from missing important features • Helps confirm known features • Helps identify unknown features • Reveals outliers as signal/noise • More work needed • Transformation of variables • More ranking criteria • More interactions

  24. User Survey via Emails • 1500 user survey emails • 13 questions on HCE and RFF • 60% successfully sent out • 85 users replied • 60 users answered a majority of questions • 25 just curious users

  25. Which features have you used? Do you think HCE improved the way you analyze your data set?

  26. Future Work • Integrating RFF with Other Tools • More ranking criteria • GRID principles available in other tools • Scaling-up • Selection/Filtering to handle large number of dimensions • Interaction in RFF • Further Evaluation

  27. Future Work • Integrating RFF with Other Tools • More ranking criteria • GRID principles available in other tools • Scaling-up • Selection/Filtering to handle large number of dimensions • Interaction in RFF • Further Evaluation

  28. Contributions • Graphics, Ranking, and Interaction for Discovery (GRID) principles • Rank-by-Feature Framework • The design and implementation of the Hierarchical Clustering Explorer (HCE) • Validation through case studies and user surveys

  29. Thank you !

More Related