400 likes | 550 Views
NSF/DHS FODAVA-LEAD: Missions and Plans. Haesun Park Computational Science and Engineering Division Georgia Institute of Technology FODAVA Kick-off Meeting, September 2008. Data and Visual Analytics (DAVA). Data Representation and Transformation. Analytical Reasoning.
E N D
NSF/DHS FODAVA-LEAD:Missions and Plans Haesun Park Computational Science and Engineering Division Georgia Institute of Technology FODAVA Kick-off Meeting, September 2008
Data and Visual Analytics (DAVA) Data Representation and Transformation Analytical Reasoning Visual Representation and Interaction Production, Presentation, Dissemination
Data and Visual Analytics (DAVA) • Data Representation and Transformation • Representing dynamic, incomplete, conflicting data to convey important content in a form and level of abstraction appropriate to the analytical task to enable understanding • Transforming data among possible representations to support analysis and discovery • Analytical Reasoning • Apply human judgment to reach conclusions • Methods to maximally utilize human capacity to derive deep understanding and insight into complex situations in a minimum amount of time • Visual Representation and Interaction • Visual presentation of information in ways that instantly convey important content taking advantage of human vision • Interaction techniques (e.g., search) between the analyst and data to facilitate the analytical reasoning process • Production, Presentation, Dissemination • Seamless integration of data acquisition, analysis, decision making, and action
A Discipline in Data & Visual Analytics I think, therefore I am. FODAVA is concerned with defining the mathematical and computational foundations for the Data and Visual Analytics Discipline Data Representation and Transformation Analytical Reasoning “Solving a problem simply means representing it so that the solution is obvious.” Herbert Simon, 96 Foundations Visual Representation and Interaction Production Presentation and Dissemination
Applications • FODAVA team will perform foundational research that can be applied to many different fields • Common end objective is to apply knowledge in decision making process, at the time and place that a decision is needed. • Common challenges across applications as well as application specific challenges • Epidemiology • Bioinformatics Medical Informatics : Theory and practice of knowledge integration, management and use in healthcare delivery, med, public health • Text Analysis • Astrophysics • Homeland Security • Biometric Recognition • Social Networks
VISION: Establishing DAVA as a Distinct Discipline Visualization Data Analytics • Develop FODAVA community, engage larger DAVA field • Researchers • Educators • Practitioners Production, Presentation, Dissemination Analytic Reasoning Data and Visual Analytics Mathematical and Computational Foundations • Establish Body of Knowledge • Foundations, subareas, applications • Curriculum • Education programs
Data and Visual Analytics Communities FODAVA FODAVA lead FODAVA partners (08, 09, …) National Visualization and Analytics Center (NVAC)/VAC Consortium RVAC/ DHS Science & Technology Center of Excellence “This partnership with NSF is the most important event since the creation of NVAC in March 2004. It brings to the front stage efforts by folks within DHS, NVAC and NSF to jointly fund the development of basic research in visual analytics supporting DHS applied mission needs.”~Jim Thomas, NVAC Director FODAVA will interact with several communities of researchers & practitioners
FODAVA-Lead Mission • Research and Education: Serve as a central facility that will involve all FODAVA awardees in a common effort to develop the scientific foundations for data and visual analytics • Effective Liaison between FODAVA Researchers and NVAC: Interface with DHS NVAC/RVAC and DHS S&T Center of Excellence in research and educational opportunities • Community Building: Integrate diverse DAVA communities and reach out for broader participation
FODAVA-Lead Challenges Research and Collaboration • Creation of the Mathematical and Computational Sciences Foundations required to represent and transform all types of digital data in ways to enable efficient and effective Visualization and Analytic Reasoning • Intrinsic Challenges: Data sets massive, heterogeneous, multi-dimensional, dirty, incomplete, time-varying; solutions must be produced with time and space constraints, …. • Understanding Fundamental issues/needs in VA and Communicating results • Isolated theoretical research is not enough • Problem driven foundational research is needed
FODAVA-Lead Challenges (cont’d) • Education and Research • Defining Foundations of Data and Visual Analytics • Undergraduate and Graduate Curriculum (core body of knowledge) for Data and Visual Analytics • Community Building/Integration • A community of researchers who claim DAVA as their own discipline and FODAVA an essential part • Conferences, journals, books, professional society engagement, • Industry, tech transfer, …
FODAVA-Lead PIs at GAtech Vladimir Koltchinskii Mathematics Machine Learning Theory Computational Statistics Alex Gray CSE Machine Learning Fast Algorithms for Massive DA Haesun Park Director CSE, Associate Chair Numerical Computing Data Analysis Research, FODAVA Community Building John Stasko Associate Director IC, Associate Chair SRVAC Co-Director Information Vis. Collaboration with NVAC and RVACs Liaison with Vis. community Renato Monteiro ISyE Continuous Optimization Statistical Computing
FODAVA-Lead Senior Personnel James Foley Associate Dean CoC Graphics and Visualization, HCI Visual Analytics Digital Library Richard Fujimoto Associate Director CSE, Chair Modeling and Simulation Education and Outreach Guy Lebanon CSE Machine Learning Computational Statistics Arkadi Nemirovski ISyE Optimization Non-parametric Stat. Santosh Vempala CS Theory of Computig Director of ARC Hongyuan Zha CSE Numerical Computing Data Analysis Director of Graduate Studies Alexander Shapiro ISyE Stochastic Programming Optimization Multivariate Stat. Analysis Hao-Min Zhou Mathematics Wavelet and PDE Image Processing
2008 FODAVA Partners • Global Structure Discovery on Sampled Spaces Leonidas Guibas and Gunnar Carlsson (Stanford University) • Visualizing Audio for Anomaly Detection Mark Hasegawa-Johnson, Thomas Huang, Hank Kaczmarski, Camille Goudeseune (University of Illinois Urbana-Champaign) • Principles for Scalable Dynamic Visual Analytics H. Jagadish, and George Michailidis (University of Michigan) • Efficient Data Reduction and Summarization Ping Li (Cornell University) • Uncertainty-Aware Data Transformations for Collaborative Reasoning Kwan-Liu Ma (UC Davis) • Mathematical Foundations of Multiscale Graph Representations and Interactive Learning Mauro Maggioni, Rachael Brady, Eric Monson (Duke University) • Visually-Motivated Characterizations of Point Sets Embedded in High-Dimensional Geometric Spaces Leland Wilkinson and Robert Grossman (University of Illinois Chicago) Adilson Motter (Northwestern University)
Expertise of FODAVA team Computational Math&Statistics Machine Learning Numeric & Geometric Computing Human Computer Interaction Optimization Simulation Gaming Graphics and Vis. Information Visualization Information Retrieval Discrete/Graph Algorithms Database Speech Recognition High Performance Computing Real-time Systems
FODAVA Activities • Body of Knowledge • Curriculum development • Repository for education materials • Distinguished lecture series • Outreach to underrepresented groups • Community Development • Communications: project description and results • FODAVA web site • Repository of FODAVA data sets and results • Conferences and meetings • Annual FODAVA Workshop • NVAC Consortium meetings • Activities at established meetings • Meetings to establish new research directions
Curriculum Development • Goals • Identify and catalog curriculum development efforts in Data and Visual Analytics • Individual courses, minors, degree programs • Undergraduate and graduate level • Leverage existing efforts (e.g., RVAC) • Share experiences, develop best practices • Develop curriculum recommendations • Curriculum workshop • POCs: Cook (NVAC), Fujimoto (FODAVA), Stasko (RVAC and FODAVA) • December 2008, Atlanta, Georgia
Visual Analytics Digital Library(http://vadl.cc.gatech.edu) • Developed by Georgia Tech (Foley et al.) in Southeast Regional Visual Analytics Center • Repository for curriculum and education materials • Lecture notes • Homeworks, projects • Reference materials, videos, etc. • Includes evolving taxonomy for Data and Visual Analytics • FODAVA will build upon this resource to • Provide a library and web portal of FODAVA educational materials • Expand support to DAVA community to include FODAVA areas • Document curriculum develop efforts
Distinguished Lecture Series • Goal: Provide forum for leaders in DAVA community to articulate vision and DAVA-related research and education activities and applications • Plans (2009) • Lecture series featuring leaders in the data and visual analytics community • Develop in collaboration with FODAVA partners, NVAC, RVAC, DHS/S&T CoE • Webcast Photo: Joe Kielman, VAC Consortium meeting, 2008
Outreach to Underrepresented GroupsExample: GT CRUISE Program • CRUISE: CSE Research Undergraduate Intern Summer Experience • Encourage students to consider PhD studies • Diverse student participation • Multicultural, emphasizing minorities, women • U.S. and international students • Ten week summer research projects in areas such as data and visual analytics, high performance computing, modeling & simulation • Interdisciplinary individual and group projects • Year-long collaboration with North Carolina A&T University • CRUISE-wide events • Weekly seminars (technical, grad studies) • Social events • Symposium: conference-style presentations
FODAVA Website http://fodava.gatech.edu • Forum for FODAVA Community • Maintain close collaboration with NVAC • Functionality • Dissemination of results to user communities • DAVA community events and meeting information depot • Repository of data sets for FODAVA community
FODAVA Annual Workshop(from Fall 2009) • Annual Theme • Initially more mathematically/computationally oriented • Increasing emphasis over time on visualization, human-computer interaction, cognitive science, … • Organizers • Co-organized in collaboration among FODAVA-Lead, FODAVA-Partners, NVAC, and DHS S&T Center of Excellence • Time • Co-locate with NVAC Fall Consortium meeting • Location • PNNL/NVAC, Richland, WA
FODAVA Annual Workshop 2009 • Theme: Machine Learning & Geometric Computing in Visual Analytics • Organizers: Vladimir Koltchinskii (GATech) and Mauro Maggioni (Duke) • Time: November, 2009 • Location: PNNL/NVAC, Richland, WA
VAC Consortium Meetings • Provides broader exposure of work, to DHS and NVAC communities • Semi-annual: Next Meeting: Nov 11-13, 2008, PNNL • Nov. 11: University Technical Exchange Day • FODAVA Panel session • FODAVA Demo/Poster session • Please participate!
Additional Workshops • FODAVA workshops at major conferences and meetings • IEEE VAST Conference • Birds of a Feather session at VAST Oct., 2008 • Workshop on Temporal Analytics • Other Potential venues • International Conference on Machine Learning • Neural Information Processing Systems (NIPS) • SIAM CSE / SIAM Optimization / SIAM ALA Conferences • ACM Knowledge Discovery and Data Mining (KDD) • AAAS meeting • Others?
Calendar of Events • Sept 2008: FODAVA Kick-Off Meeting • Oct 2008: VAST 2008 BoF Session • Nov 2008: VAC Consortium meeting, FODAVA Panel and Poster/Demo Session • Dec 2008: DAVA Curriculum Workshop • May 2009: VAC Consortium Meeting • Oct 2009: VAST Conference • Nov 2009: VAC Consortium and FODAVA Annual Workshop • Temporal Analytics Workshop under consideration
Project Materials • Goal: Articulate contributions being made by the FODAVA community • Benefits • Potential collaborators • Foster technology transition opportunities • Broader exposure to potential sponsors • Materials requested • Project brochures and other collateral material • Videos especially welcome • Tell us what you’re doing! • POC: Richard Fujimoto
Concluding Remarks • DAVA represents a new, exciting discipline that brings together diverse communities • Research is motivated and driven by real-world problems • FODAVA will play a key role in developing and defining the foundations for DAVA • Communication and collaboration with other elements of DAVA (e.g., NVAC, RVAC, DHS/S&T CoE) is essential • We need to educate ourselves! Thank you!
Student Interns • Support deep research collaboration between FODAVA lead, FODAVA partners, and PNNL / NVAC • Fundamental research driven by real-world applications • Leverage existing intern programs at PNNL • Summer interns • Leverage GT distance learning capability for academic year interns • Details to be determined
Undergraduate Education • Georgia Tech Threads curriculum • Undergraduate program defined as a set of 8 threads • Thread is a body of coursework targeting a certain career path, e.g., modeling and simulation, human computer interaction, embedded systems, etc. • Students take two threads to complete BS in CS degree • Existing threads • Modeling and Simulation: representing processes/systems • Devices: embedded computing • Theory: theoretical foundations of computing • Information Networks: information communication • Intelligence: human-level intelligence • Media: systems for creative expression • People: human-centric computing • Platforms: computing systems, architecture, languages
Computational modeling is about going from Fluid flow model to Queueing Model Cellular Automata Modeling & Simulation Thread • Many students come to Georgia Tech with an inherent love for math and science • Computation provides a framework to view, understand, analyze, and design systems Involves developing mathematical / conceptual abstractions of systems that can be represented by efficient software
A Data and Visual Analytics Thread? Aero Civil, Elect. EAS, Biology Chemistry, Math Physics, Industrial Eng. Application Discipline (pick one) • Curriculum • Foundational mathematics, computing, science • Data analytics, information visualization • Application-oriented specialization • Integrated approach with capstone design project • Natural complement to modeling and simulation thread Computational Methods for Data Analysis And Visualization ? Math Computing Science Theory Software Hardware Algorithms Physics Biology Chemistry Discrete Math Continuous Math Foundations
Application Domains • DHS: Intelligence analysis, Law Enforcement, Emergency response, Intrusion and fraud detection, …. • BioMedical Informatics • Bioinformatics/Systems Biology • Astronomy • Text Analysis: Documents, e-mails, … • Cybersecurity • Transportation • …
Vladimir Koltchinskii, School of Mathematics • Machine Learning • Learning Theory • Feature Selection • Theory of Sparse Recovery • Empirical Risk Minimization • Computational Statistics Sparse Recovery : For automatic determination of relevant features (Basis pursuit, Soft threshholding, LASSO …) Comprehensive theory is only starting to be developed Penalized Empirical Risk Minimization: Basis for many solutions in basic problems of learning theory, e.g. regression, classification, density estimation Challenge: extend the theory of sparse recovery to broader framework of learning theory, e.g. infinite classes of functions
Renato Monteiro, School of Industrial & Sys. Eng. • Continuous Optimization • Interior-point methods • Semidefinite programming • Cone programming • Algorithms for large-scale optimization • Computational Statistics and Graph Theory • Dimension Reduction and Semi-definite Programming • Higher level of reduction with more difficult objective function • Learning manifolds which preserve ordering of distances • Off-the-shelf SDP software does not scale • Design of efficient algorithms based on the first-order method, convex-concave saddle point problem
Alexander Gray, Computational Sci. & Eng. Goal: make machine learning efficient • For massive datasets, e.g. for astronomy, Large Hadron Collider, network traffic • For fast visualization, e.g. our new manifold learning methods • Developed fastest practical algorithms for many learning methods • Coming in Dec 2008: MLPACK library
Information Visualization Human Computer Interaction Visualization for Investigative Analysis - Putting the Pieces Together with Jigsaw John Stasko, School of Interactive Computing and GVU Center Help investigative analysts discover plans, plots and threats embedded across large document collections Multiple visualizations (views) of the documents, entities, & their connections Views are highly interactive and coordinated Analysts explore the documents and entities through the views Building a collaborative version Representing reliability and uncertainty Entity aliasing and hierarchy support Visualizing the investigative process
Haesun Park, Computational Sci. & Eng. • Numerical Computing • Algorithms for Massive Data Analysis • Dimension Reduction • Clustering and Classification • Bioinformatics • Microarray analysis • Protein structure prediction Effective Dimension Reduction with Prior Knowledge • Dimension Reduction for Clustered Data: Linear Discriminant Analysis (LDA), Generalized LDA (LDA/GSVD), Orthogonal Centroid Method (OCM) • Dimension Reduction for Nonnegative Data: Nonnegative Matrix Factorization (NMF) • Applications: Text Classification, Face Recognition, Fingerprint Classification, Gene Clustering in Microarray Analysis …
Education and Outreach Goals FODAVA lead will • Encourage and coordinate development of FODAVA Curriculum • Encourage and coordinate knowledge exchange toward creating a workforce pipeline • Undergraduate education • Graduate education • Lifelong learning • Facilitate research collaboration • Facilitate outreach to underrepresented groups
Engaging FODAVA Community • FODAVA program provides a platform to bring together community of researchers, educators and practitioners • Activities might include • Education workshops to share experiences, develop best practices • Curriculum development • Repository of information and teaching materials (e.g., SRVAC, VADL)