820 likes | 1.08k Views
Data Exploration, Analysis, and Representation: Integration through Visual Analytics. Remco Chang, PhD UNC Charlotte Charlotte Visualization Center. Problem Statement. The growth of data is exceeding our ability to analyze them.
E N D
Data Exploration, Analysis, and Representation: Integration through Visual Analytics Remco Chang, PhD UNC Charlotte Charlotte Visualization Center
Problem Statement • The growth of data is exceeding our ability to analyze them. • The amount of digital information generated is growing exponentially… • 2002: 22 EB (exabytes, 1018) • 2006: 161 EB • 2010: 988 EB (almost 1 ZB) 1: Data courtesy of Dr. Joseph Kielman, DHS 2: Image courtesy of Dr. Maria Zemankova, NSF
Problem Statement • The data is often complex, ambiguous, noisy. Analysis of which requires human understanding. • About 2 GB of data is being produced per person per year • 95% of the Digital Universe’s information is unstructured • There isn’t enough man-power to analyze all the data, and the problem is getting worse! • Solution: help the user • Find patterns • Filter out noise • Focus on the important stuff 1: Data courtesy of Dr. Joseph Kielman, DHS 2: Image courtesy of Dr. Maria Zemankova, NSF
Example: What Does (Wire) Fraud Look Like? • Financial Institutions like Bank of America have legal responsibilities to report all suspicious wire transaction activities (money laundering, supporting terrorist activities, etc) • Data size: approximately 200,000 transactions per day (73 million transactions per year) • Problems: • Automated approach can only detect known patterns • Bad guys are smart: patterns are constantly changing • Data is messy: lack of international standards resulting in ambiguous data • Current methods: • 10 analysts monitoring and analyzing all transactions • Using SQL queries and spreadsheet-like interfaces • Limited time scale (2 weeks)
WireVis: Financial Fraud Analysis • In collaboration with Bank of America • Develop a visual analytical tool (WireVis) • Visualizes 7 million transactions over 1 year • Currently beta-deployed at WireWatch • Uses interaction to coordinate four perspectives: • Keywords to Accounts • Keywords to Keywords • Keywords/Accounts over Time • Account similarities (search by example) R. Chang et al., Scalable and interactive visual analysis of financial wire transactions for fraud detection. Information Visualization,2008. R. Chang et al., Wirevis: Visualization of categorical, time-varying data from financial transactions. IEEE VAST, 2007.
WireVis: A Visual Analytics Approach Search by Example (Find Similar Accounts) Heatmap View (Accounts to Keywords Relationship) Keyword Network (Keyword Relationships) Strings and Beads (Relationships over Time)
Introducing Visual Analytics • Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces [Thomas & Cook 2005] Graphics & Visualization • Since 2004, the field has grown significantly. Aside from tens to hundreds of domestic and international partners, it now hasa IEEE conference (IEEE VAST), an NSF program (FODAVA), and a forthcoming IEEE Transactions journal. Interaction & Reasoning Computing
Visual Analytics, A Graphics Perspective • Master’s Thesis -- • Simulating dynamic motion based on kinematic motion • Jiggling of muscles • Skinnable Mesh • Volumetric deformation • Compared 3 types of mass-spring systems • Regular (unconstrained) mass-spring • Reduced degree of freedom • Approximate finite element method with implicit integration • Is this applicable beyond graphics and simulation? R. Chang, Simulation Techniques for Deformable Animated Characters. Master’s Thesis, Brown University, 2000.
From Graphics to Visual Analytics:An Example in Urban Simplification • (left) Original model, 285k polygons • (center) e=100, 129k polygons (45% of original) • (right) e=1000, 53k polygons (18% of original) R. Chang et al., Legible simplification of textured urban models. IEEE Computer Graphics and Applications, 28(3):27–36, 2008. R. Chang et al., Hierarchical simplification of city models to maintain urban legibility. ACM SIGGRAPH 2006 Sketches, page 130 , 2006.
Original Model Simplified Model using QSlim Our Textured Model Our Model Urban Simplification • Which polygons to remove? • Visually different, but quantitatively similar!
Urban Simplification • The goal is to retain the “Image of the City” • Based on Kevin Lynch’s concept of “Urban Legibility” [1960] • Paths: highways, railroads • Edges: shorelines, boundaries • Districts: industrial, historic • Nodes: Time Square in NYC • Landmarks: Empire State building
Algorithm for Preserving Legibility • Paths & Edges • Hierarchical (single-link) clustering • Nodes • Merging clusters • Polyline simplification using convex hulls • Landmarks • Pixel-based skyline preservation • That’s pretty good, right?
Urban Visualization with Semantics • How do people think about a city? • Describe New York… • Response 1: “New York is large, compact, and crowded.” • Response 2: “The area where I live has a strong mix of ethnicities.” Geometric, Information, View Dependent (Cognitive)
Urban Visualization • Geometric • Create a hierarchy of shapes based on the rules of legibility • Information • Matrix view and Parallel Coordinates show relationships between clusters and dimensions • View Dependence (Cognitive) • Uses interaction to alter the position of focus R. Chang et al., Legible cities: Focus-dependent multi-resolution visualization of urban relationships. IEEE Transactions on Visualization and Graphics , 13(6):1169–1175, 2007
Probe-based Interface • Using Probes allows for comparing multiple regions-of-interest simultaneously R. Chang et al., Multi-focused geospatial analysis using probes. Visualization and Computer Graphics, IEEE Transactions on, 14(6):1165–1172, Nov.-Dec. 2008.
Urban VisualizationGraphics + Visual Analytics • Applying graphics approaches • Data transformation (clustering, LOD, simplification) • Screen-based metrics • Hardware acceleration • Applying visual analytics principles • Multi-dimensional data representation • Interactive exploration • Broader applicability
Extending Visual Analytics Principles Who • Global Terrorism Database • With University of Maryland • Application of the investigative 5 W’s • Bridge Maintenance • With US DOT • Exploring subjective inspection reports • Biomechanical Motion • With U. Minnesota and Brown • Interactive motion comparison methods Where What Evidence Box Original Data When R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum,2008.
Extending Visual Analytics Principles • Global Terrorism Database • With University of Maryland • Application of the investigative 5 W’s • Bridge Maintenance • With US DOT • Exploring subjective inspection reports • Biomechanical Motion • With U. Minnesota and Brown • Interactive motion comparison methods R. Chang et al., An Interactive Visual Analytics System for Bridge Management, Journal of Computer Graphics Forum,2010. To Appear.
Extending Visual Analytics Principles • Global Terrorism Database • With University of Maryland • Application of the investigative 5 W’s • Bridge Maintenance • With US DOT • Exploring subjective inspection reports • Biomechanical Motion • With U. Minnesota and Brown • Interactive motion comparison methods R. Chang et al., Interactive Coordinated Multiple-View Visualization of Biomechanical Motion Data, IEEE Vis (TVCG) 2009.
Human + ComputerA Mixed-Initiative Perspective • Our approach is great and successful! But it’s mostly user-driven… • Human vs. Artificial Intelligence Garry Kasparov vs. Deep Blue (1997) • Computer takes a “brute force” approach without analysis • “As for how many moves ahead a grandmaster sees,” Kasparov concludes: “Just one, the best one” • Artificial Intelligence vs. Augmented Intelligence Hydra vs. Cyborgs (2005) • Grandmaster + 1 chess program > Hydra (equiv. of Deep Blue) • Amateur + 3 chess programs > Grandmaster + 1 chess program1 • How to systematically repeat the success? • Unsupervised machine learning + User • User’s interactions with the computer Computer Process (Translate) Human 1. http://www.collisiondetection.net/mt/archives/2010/02/why_cyborgs_are.php
Human + Computer:Dimension Reduction – Lost in Translation • Dimension reduction using principle component analysis (PCA) • Quick Refresher of PCA • Find most dominant eigenvectors as principle components • Data points are re-projected into the new coordinate system • For reducing dimensionality • For finding clusters • For many (especially novices), PCA is easy to understand mathematically, but difficult to understand “semantically”. height GPA 0.5*GPA + 0.2*age + 0.3*height = ? age
Human + Computer:Exploring Dimension Reduction: iPCA R. Chang et al., iPCA: An Interactive System for PCA-based Visual Analytics. Computer Graphics Forum (Eurovis),2009.
Human + Computer: Comparing iPCA to SAS/INSIGHT • Results • Users seem to understand the intuition behind PCA better • A bit more accurate • Not faster • People don’t “give up” • Overall preference • Using letter grades (A through F) with “A” representing excellent and F a failing grade. • Problem is worse with non-linear dimension reduction • A lot more work needs to be done…
Human + Computer:User Interactions • Capture a user’s interactions in a visual analytics system • Translate the interactions into something that would affect the computation in a meaningful way • Challenge: • Can we capture and extract a user’s reasoning and intent through capturing a user’s interactions? Computer Process (Translate) Human
What is in a User’s Interactions? • Goal: determine if a user’s reasoning and intent are reflected in a user’s interactions. Grad Students (Coders) Compare! (manually) Analysts Strategies Methods Findings Guesses of Analysts’ thinking Logged (semantic) Interactions WireVis Interaction-Log Vis
What’s in a User’s Interactions • From this experiment, we find that interactions contains at least: • 60% of the (high level) strategies • 60% of the (mid level) methods • 79% of the (low level) findings R. Chang et al., Recovering Reasoning Process From User Interactions. IEEE Computer Graphics and Applications, 2009. R. Chang et al., Evaluating the Relationship Between User Interaction and Financial Visual Analysis. IEEE Symposium on VAST, 2009.
User Interactions, A Computational Approach • Now that we’ve shown that (interaction ~= reasoning ) • Can we automate the process? • Consider each of a user’s interactions as a fixed-length vector (Design Galleries [Marks et al. Siggraph 97]). Computer Process (Translate) Human • User interaction in the left application can be represented as a single dimensional vector <P> • User interaction in the right application can be represented as a two dimensional vector <P, S>
Conclusion • Visual Analytics is a growing new area that is looking to address some pressing needs • Too much (messy) data, too little time • By integrating interaction, graphics, and data computation, we have demonstrated that • There are some great benefits • But there are also some difficult challenges • With great challenges come great opportunities… • Government agencies • Industrial partners Graphics & Visualization Interaction & Reasoning Computing
Future Work (Funded Projects) • NSF SciSIP: • Title: A Visual Analytics Approach to Science and Innovation Policy. • PI: William Ribarsky, Co-PIs: Jim Thomas, Remco Chang, Jing Yang. • $746,567. 2009-2012 (3 years). • Abstract: developing metrics and visual tools for identifying patterns in science policies. • NSF/DOD (Minerva Initiative): • Title: Collaborative Project: Terror, Conflict Processes, Organizations, & Ideologies: Completing the Picture. • PI: Remco Chang • $100,000. 2009-2010 (2 years). • Abstract: design and develop visual analytical tools to identifying the causal relationships in government policies and domestic conflicts. • DHS International Program: • Title: Deriving and Applying Cognitive Principles for Human/Computer Approaches to Complex Analytical Problems. • PI: William Ribarsky, Co-PIs: Brian Fisher, Remco Chang, John Dill. • $200,000. 2009-2010 (1 year). • Abstract: identifying new evaluation methods for visual analytical systems, and applying computational methods for analyzing user interactions. • Quantitative Analysis Division at Bank of America • Exploration and analysis of financial risk
Future Work (On-going Collaborations) • With NSF FODAVA Center at Georgia Tech (Dr. Haesun Park, director) • Interpreting user interactions to affecting machine learning algorithms • Visual PCA: using perceptual metrics to finding principle components • Applying perceptual constraint to dimension reduction: for animating temporal data in dimension reduction, find methods to maintain hysteresis • With University of Kentucky (Drs. Judy Goldsmith, Jinze Liu, Phillip Chang, MD) • Integrating data mining (KDD), POMDP, and visual analytics to prevent sepsis by identifying biomarkers (Proposal in submission to NSF CDI) • With geographer and architect at UNC Charlotte (Dr. Jean-Claude Thill and Eric Sauda) • Designing computational methods for identifying neighborhood characteristics (Proposal in submission to NSF IIS) • Applying the UrbanVis system to analyzing crime (proposal in preparation for DOJ/NIJ) • With Virginia Tech (Dr. Chris North) and Pacific Northwest National Lab (Dr. Bill Pike and Richard May) • Developing a research agenda for analytic provenance (Workshop proposal in submission to DHS)
Thank you! Graphics & Visualization Interaction & Reasoning Computing rchang@uncc.edu http://www.viscenter.uncc.edu/~rchang
Acknowledgement From the Data Visualization Group (DVG) at UNC Charlotte Bill Ribarsky Zach Wartell Dong Hyun Jeong, Tom Butkiewicz, Xiaoyu Wang, Wenwen Dou, Tera Green
Acknowledgement Eric Sauda From the Urban Visualization Group at UNC Charlotte Jean-Claude Thill Ginette Wessel Elizabeth Unruh
Acknowledgement More Collaborators… Clockwise, starting on the left: Nancy Pollard (CMU), Evan Suma (UNCC), Heather Lipford (UNCC), Dan Keefe (UMN), Caroline Ziemkiewicz (UNCC), Robert Kosara (UNCC), Mohammad Ghoniem
Acknowledgement • And many many others… Joseph Kielman (DHS), Bill Pike (PNNL), Theresa O'Connell (NIST), Seok-Won Lee (UNCC), Brian Fisher (Simon Fraser), Alvin Lee (BofA), Jing Yang (UNCC), Daniel Kern (BofA), Agust Sudjianto (BofA), Erin Miller (UMD), Kathleen Smarick (UMD), Felesia Stukes (UNCC), Marcus Ewert (UMN), Larry Hodges (Clemson), Michael Butkiewicz (UC Riverside), Josh Jones (BofA), Alex Godwin (Charles River Analytics), Edd Hauser (UNCC), Shenen Chen (UNCC), Bill Tolone (UNCC), Wanqiu Liu (UNCC), RashnaVatcha (UNCC)
Final Thought… • “The sexy job in the next 10 years will be statisticians,” said Hal Varian, chief economist at Google. “And I’m not kidding.” • Yet data is merely the raw material of knowledge. “We’re rapidly entering a world where everything can be monitored and measured,” said Erik Brynjolfsson, an economist and director of the Massachusetts Institute of Technology’s Center for Digital Business. “But the big problem is going to be the ability of humans to use, analyze and make sense of the data.” • “The key is to let computers do what they are good at, which is trawling these massive data sets for something that is mathematically odd,” said Daniel Gruhl, an I.B.M. researcher whose recent work includes mining medical data to improve treatment. “And that makes it easier for humans to do what they are good at — explain those anomalies.”1 Graphics & Visualization Interaction & Reasoning Computing 1. New York Times. “For Today’s Graduate, Just One Word: Statistics “, August 5, 2009.
Individually Not Unique • Interaction Design • Cognitive Psychology • Intelligence Analysis • etc. Analytical Reasoning and Interaction • Data Mining • Machine Learning • Databases • Information Retrieval • etc Data Representation Transformation Visual Representation • InfoVis • SciVis • Graphics • etc Production, Presentation Dissemination Validation and Evaluation • Tech Transfer • Report Generation • etc • Quality Assurance • User studies (HCI) • etc
In Combinations of 2 or 3… Analytical Reasoning and Interaction • Data Mining • Machine Learning • Databases • Information Retrieval • etc Data Representation Transformation Visual Representation • InfoVis • SciVis • Graphics • etc Production, Presentation Dissemination Validation and Evaluation
In Combinations of 2 or 3… • Interaction Design • Cognitive Psychology • Intelligence Analysis • etc. Analytical Reasoning and Interaction Data Representation Transformation Visual Representation Production, Presentation Dissemination Validation and Evaluation • Tech Transfer • Report Generation • etc
This Talk Focuses On… • Interaction Design • Cognitive Psychology • Intelligence Analysis • etc. Analytical Reasoning and Interaction • Data Mining • Machine Learning • Databases • Information Retrieval • etc Data Representation Transformation Visual Representation • InfoVis • SciVis • Graphics • etc Production, Presentation Dissemination Validation and Evaluation
Eureka: Visual Analytics!! “Saunders, perhaps you’re getting a bit carried away with the visual analytics!”1 1: Slide courtesy of Dr. Maria Zemankova, NSF
Case Study on WireVis • User Centric • Designed system based on domain expertise • Visual Interface • Multiple coordinated views that link multiple dimensions • Interactive • Overview, drill-down, reclustering • Data Clustering • Clustering by accounts, and search by example • Production • Connected to a live database and beta-deployed at BofA • (Validation) • Expert evaluation Analytical Reasoning and Interaction Data Representation Transformation Visual Representation Production, Presentation Dissemination Validation and Evaluation
Algorithm to Preserve Legibility • Identify and preserve Paths and Edges • Create logical Districts and Nodes • Simplify model while preserving Paths, Edges, Districts, and Nodes • Hierarchically apply appropriate amount of texture • Highlight Landmarks and choose models to render
Single-Link Clustering Iteratively groups the “closest” clusters together based on Euclidean distance produces a binary tree (dendrogram) Penalizes large clusters to create a more balanced tree Identifying and PreservingPathsand Edges (1) a b c d e f bc de def abc bcdef abcdef
Creating logical Districts and Nodes