340 likes | 419 Views
Enhancing Set-Analysis through Scalable Visualizations. Presented by: Hamid Haidarian Shahri ( hamid@cs.umd.edu ) Mudit Agrawal ( mudit@cs.umd.edu ). Content. Problem Definition Motivation Dataset Architecture Visualization Methods Interaction Tools Demo Future Work.
E N D
Enhancing Set-Analysis through Scalable Visualizations Presented by: Hamid Haidarian Shahri (hamid@cs.umd.edu) Mudit Agrawal (mudit@cs.umd.edu)
Content • Problem Definition • Motivation • Dataset • Architecture • Visualization Methods • Interaction Tools • Demo • Future Work CMSC 838S Information Visualization Spring 2006
Problem Definition • Analysis of sets by • representing the clusters graphically • depicting their internal and external links • Scaling visualization CMSC 838S Information Visualization Spring 2006
Motivation • Sets are encountered in various domains • websites • commodities • publications • anything that has attributes!! • Visualization of sets to aid human perception is still an unsolved problem • no direct relations between sets (or its elements) in spatial domain • can be grouped based on various attributes CMSC 838S Information Visualization Spring 2006
Dataset • 2700 law cases • Each case identified by a numerical id ranging from 1000 to 3718 • Tuples in the dataset imply a referencing • Relation is unidirectional and not symmetric (the referencing also implies a temporal constraint on the cases) CMSC 838S Information Visualization Spring 2006
Snapshot of the data First 50 links (approximately 0.1 percent of whole dataset) (1001,1105,'100 S.Ct. 318'),(1001,1612,'101 S.Ct. 2352'),(1001,1018,'107 S.Ct. 1232'),(1001,1016,'112 S.Ct. 2886'),(1001,2923,'113 S.Ct. 2264'),(1001,1016,'120 L.Ed.2d 798'),(1001,2923,'124 L.Ed.2d 539'),(1001,2286,'138 F.3d 1036'),(1001,2396,'238 F.3d 382'),(1001,3410,'438 U.S. 104'),(1001,1105,'444 U.S. 51'),(1001,1612,'452 U.S. 264'),(1001,1018,'480 U.S. 470'),(1001,1016,'505 U.S. 1003'),(1001,2923,'508 U.S. 602'),(1001,3410,'57 L.Ed.2d 631'),(1001,1105,'62 L.Ed.2d 210'),(1001,1612,'69 L.Ed.2d 1'),(1001,1789,'926 F.2d 1169'),(1001,1018,'94 L.Ed.2d 472'),(1001,3410,'98 S.Ct. 2646'),(1002,1276,'100 S.Ct. 2138'),(1002,1101,'105 S.Ct. 3108'),(1002,1018,'107 S.Ct. 1232'),(1002,1098,'107 S.Ct. 2378'),(1002,1016,'112 S.Ct. 2886'),(1002,1015,'114 S.Ct. 2309'),(1002,1016,'120 L.Ed.2d 798'),(1002,1013,'121 S.Ct. 2448'),(1002,1012,'122 S.Ct. 1465'),(1002,1015,'129 L.Ed.2d 304'),(1002,2316,'142 F.3d 1319'),(1002,1013,'150 L.Ed.2d 592'),(1002,1012,'152 L.Ed.2d 517'),(1002,1121,'266 F.3d 487'),(1002,3028,'306 F.3d 113'),(1002,3410,'438 U.S. 104'),(1002,1276,'447 U.S. 255'),(1002,1101,'473 U.S. 172'),(1002,1018,'480 U.S. 470'),(1002,1098,'482 U.S. 304'),(1002,1016,'505 U.S. 1003'),(1002,1015,'512 U.S. 374'),(1002,1013,'533 U.S. 606'),(1002,1012,'535 U.S. 302'),(1002,3410,'57 L.Ed.2d 631'),(1002,2091,'59 F.3d 852'),(1002,1276,'65 L.Ed.2d 106'),(1002,1889,'746 F.2d 135'),(1002,1101,'87 L.Ed.2d 126'),(1002,1018,'94 L.Ed.2d 472'),(1002,2319,'953 F.2d 1299'),(1002,1098,'96 L.Ed.2d 250'),(1002,3410,'98 S.Ct. 2646'),(1002,1022,'980 F.2d 84'),(1002,2670,'989 F.2d 362'),(1003,1104,'100 S.Ct. 383'),(1003,1611,'104 S.Ct. 2862'),(1003,1100,'106 S.Ct. 1018'),(1003,1099,'107 S.Ct. 2076'),(1003,1016,'112 S.Ct. 2886'),(1003,3110,'116 S.Ct. 2432'),(1003,1016,'120 L.Ed.2d 798'),(1003,1012,'122 S.Ct. 1465'),(1003,1881,'13 F.3d 1192'),(1003,3054,'133 F.3d 893'),(1003,3110,'135 L.Ed.2d 964'),(1003,1012,'152 L.Ed.2d 517'),(1003,1047,'18 F.3d 1560'),(1003,1886,'265 F.3d 1237'),(1003,2689,'271 F.3d 1090'),(1003,1358,'271 F.3d 1327'),(1003,1149,'28 F.3d 1171'),(1003,1040,'331 F.3d 891') (1001,1105,'100 S.Ct. 318') CMSC 838S Information Visualization Spring 2006
Architecture Visualization Module Clustering Module Clustered Data Data Similarity Metric CMSC 838S Information Visualization Spring 2006
Routine K-Means Clustering • Data points are in vector space. • x andare vectors. • This assumption does not hold for cases represented as sets. • Centroids are not simple geometric means. • In fact, mean does not make any sense. CMSC 838S Information Visualization Spring 2006
Routine Self Organizing Map • Wvand D are assumed to be vectors. • Wv(t + 1) = Wv(t) + Θ(t)α(t) [D(t) - Wv(t)] • This assumption does not hold. CMSC 838S Information Visualization Spring 2006
Similarity Measures • Jaccard similarity • Reference-based similarity • Weighted reference-based similarity CMSC 838S Information Visualization Spring 2006
Contribution to clustering • Applying K-means and SOM for producing better visualizations • Not apparent at first glance, but the above algorithms are not applicable to set visualization directly • They assume a 2D or nD (vector) representation for each data point (i.e. law case). More specifically, the attributes must form a vector space. • This assumption does not hold • no clear geometric attribute corresponding to the dataset CMSC 838S Information Visualization Spring 2006
Similarity Metrics Geometric Metrics • 1-D Partitioning • 2-D Partitioning • Sequential arrangement • Distance based arrangement CMSC 838S Information Visualization Spring 2006
K-Means CMSC 838S Information Visualization Spring 2006
K-Means CMSC 838S Information Visualization Spring 2006
SOM after K-Means CMSC 838S Information Visualization Spring 2006
Various Interactive Tools • Referencing pattern (activating all links) • Local referencing • Density map • Representative element • Tool tip • Link follow-up • Search CMSC 838S Information Visualization Spring 2006
Referencing Pattern CMSC 838S Information Visualization Spring 2006
Local Referencing CMSC 838S Information Visualization Spring 2006
Local Referencing CMSC 838S Information Visualization Spring 2006
Density Map CMSC 838S Information Visualization Spring 2006
Density Map CMSC 838S Information Visualization Spring 2006
Representative Element CMSC 838S Information Visualization Spring 2006
Link Follow-up CMSC 838S Information Visualization Spring 2006
Link Follow-up CMSC 838S Information Visualization Spring 2006
Link Follow-up CMSC 838S Information Visualization Spring 2006
Link Follow-up CMSC 838S Information Visualization Spring 2006
Link Follow-up CMSC 838S Information Visualization Spring 2006
Link Follow-up CMSC 838S Information Visualization Spring 2006
Link Follow-up CMSC 838S Information Visualization Spring 2006
Link Follow-up CMSC 838S Information Visualization Spring 2006
Future Work • Other clustering algorithms can be explored: • Spectral • Fuzzy C-means • More similarity functions • Better initial posting of data • Zooming and Panning CMSC 838S Information Visualization Spring 2006
References • Abello, J., Korn, J., Visualizing Massive Multi-Digraphs. Proceedings of the IEEE Symposium on Information Visualization 2000. • Berry, M.W., Drma, Z., Jessup, E.R., Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 41:2, 1999, pp. 335-362. • Gansner , E.R., Koutsofios, E., North, S.C., Vo, K.P., A Technique for Drawing Directed Graphs. IEEE Trans. on Soft. Eng. 19(3), 1993, pp. 214-230. • Guimerà, R., Mossa, S., Turtschi, A., Amaral, L.A.N., The Worldwide Air Transportation Network: Anomalous Centrality, Community Structure, and Cities' Global Roles. Proceedings of the National Academy of Sciences 102, May 31, 2005, pp. 7794-7799. • Jain, A.K., Murty, M.N., Flynn, P.J., Data Clustering: A Review. ACM Computing Surveys, 1999. • Kohonen, T., The Self-Organizing Map. Proceedings of the IEEE, Volume 78, Issue 9, Sept. 1990, pp. 1464-1480. • Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., Saarela, A., Self organization of a massive document collection. IEEE Transactions on Neural Networks, Vol. 11, 2000, pp. 574-585. • Kunz, C., Botsch, V., Ziegler, J., Spath, D., Contextualizing Search Results in Networked Directories. Proceedings of HCII, 2003. • Leuski, A., Strategy-based Interactive Cluster Visualization for Information Retrieval. International Journal on Digital Libraries, Vol. 3, Issue 2, 2000, pp. 170. • Liu, X., Luo, M., Shneiderman B. Visualization of Sets. Unpublished manuscript, 2005. • McQueen, J.B., Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1967, pp. 281-297. • Murata, T., Visualizing the Structure of Web Communities Based on Data Acquired From a Search Engine. IEEE Trans. on Industrial Electronics, Vol. 50, No. 5, 2003. • Palla, G., Derenyi, I., Farkas, I., Vicsek, T., Uncovering the Overlapping Structure of Complex Networks in Nature and Society. Nature Letters, Vol. 435, 9 June 2005, pp. 814. • Self-organizing map. Wikipedia, The Free Encyclopedia. • Seo, J., Shneiderman, B., Understanding Hierarchical Clustering Results by Interactive Exploration of Dendograms: A Case Study with Genomic Microarray Data. IEEE Computer Special Issue on Bioinformatics, Volume 35, No. 7, July 2002, pp. 80-86. CMSC 838S Information Visualization Spring 2006