340 likes | 364 Views
Visual Data Mining with MineSet™. MineSet Web site URL: www.sgi.com/software/mineset. Data Warehouses or Business Data. MineSet ™. Business Insights. What is MineSet™?.
E N D
Visual Data Mining with MineSet™ MineSet Web site URL: www.sgi.com/software/mineset
Data Warehouses or Business Data MineSet™ Business Insights What is MineSet™? • Visual data mining technology that helps your business quickly turn large amounts of data into actionable business insights
Why is MineSet™ so important? • Business Intelligence Solutions • Visualization is essential (IDC,1998) • 80% of users find visualization to be desirable • 51% find it very or extremely desirable • Data mining algorithms (IDC, 1998) • Important to over 80% of data warehousing users • Explosive data growth (Meta Group, 1997) • Data warehouses double in size every 12 to 18 months • Scalability, CPU performance and I/O bandwidth (The Data Warehouse Institute, 1998) • Most important factors in selecting a data mining or data warehouse hardware platform
Action/Feedback Layer Analytical Layer IT Infrastructure Layer Enterprise Data Warehouse ECTL Processes Operational System(s) Mart 1 Mart 2 Mart N Descriptive Predictive Visual How does MineSet™ fit into Business Intelligence Solutions? Visualizations for Decision Makers Visualization & Analytic Model Development for Business Analysts
MineSet Client GUI Transformations Denormalized Data Subset MineSet Server Visual Analytical Data Mining Data Mining Business Insights & Understanding API to OLAP and Other Mining Algorithms Data Mining Discovery Process Selection Data Warehouse or Business Data
MineSet Clients • Windows • SGI IRIX • Launch via • COM • ActiveX Tool Manager (Controls, Visualizations) • MineSet Servers • NT, Linux • 32-bit • Single Threaded • SGI IRIX • 64-bit • Parallel • RDBMS Connections • Oracle • Sybase • Informix • ODBC • Flat Files Analytical Data Mining Engine Data Transformations API to OLAP & Other Mining Algorithms Data Warehouse & Business Data Data Mining Discovery Process with MineSet™
MineSet™ 3.1 Key Features • Powerful Visual Data Mining • Visualizations launched from MineSet™ Clients or any Windows application or WEB browser • Insightful Analytic Data Mining • Classification, Regression, Association and Clustering data mining model development • Software Development Toolkit (SDK) • Plug-in APIs for Analytics, Visual, Transformations and Functions • Application Toolkit Extensions • APIs to facilitate the writing of Web enabled applications
Statistics Visualizer Mean, Min/Max, Std. Dev. analysis Histogram Visualizer Distinct count & range analysis Visual Data Mining with MineSet™
Scatter Visualizer Multi-dimensional analysis Splatter Visualizer Multi-dimensional analysis for large data sets Visual Data Mining with MineSet™
Map Visualizer Spatial trend analysis Tree Visualizer Hierarchical trend analysis Visual Data Mining with MineSet™
Unsupervised Mining (descriptive & unlabeled columns) Clustering Association • Correlations • one-to-one • multi-way • Segmentations • k-means • iterative k-means Analytic Data Mining with MineSet™ Learning Supervised Mining (predictive & labeled columns) Classification Regression • Continuous columns • Regression trees • Discrete columns • Decision trees * • Evidence * • Option trees • Decision tables • Column Importance (column selection) • * boosting option
Decision Tree Visualizer for Decision Tree data mining analysis Regression Tree Visualizer for Regression Tree data mining analysis Visualizing Analytic Data Mining Models with MineSet™
Evidence Visualizer for Naïve-Bayes data mining results, interactive scoring & analysis Decision Table Visualizer for Decision Table data mining analysis Visualizing Analytic Data Mining Models with MineSet™
Cluster Visualizer for Cluster data mining results analysis Association Visualizer for LHS/RHS association i.e., “market basket” analysis Visualizing Analytic Data Mining Models with MineSet™
Data Sources • MineSet Server & DB connections Data Destinations • Visualization Tools • Mining Tools • Data export Data Transformations • Remove, add, change or bin columns • Filter, Aggregate or sample columns • Apply Model or Plug-in Data Source MineSet™ Tool Manager -- Integrating it all together
Visualization Deployment Options with MineSet™ • Visualizations can be launched via MineSet™ Clients from • Any Windows application through the COM protocol as ActiveX components • Any MineSet™ client on Windows or SGI IRIX • WEB browsers through the COM protocol as ActiveX components <CLICK HERE TO LAUNCH> • WEB browsers by recording the visualization in a web-media format or saving a snap shot
Data Warehousing Wave Internally Focused Reactive Problem Solving Build data repositories to understand the past Re-engineer infrastructures to support business operations and process workflow Data Mining Wave Customer Focused Proactive Solutions & New Market Development Predict customer behavior, market trends, and competitive environments Leverage IT infrastructure with Business Intelligence Solutions 2000 1985 1995 2005 1990 Data Mining Wave -- Business Intelligence Solutions
Obtain Data Identify Business Indicators Measure Results Investigate & Drill Down Implement Action Plan Develop Visual/Analytical Models & Business Scenarios Approve Action Plan Create Action Plan Close Loop Business Model Data Mining Pilot/Project MineSet SGI PSO or Data Mining Partner
MineSet™ Business Intelligence Solution Examples • State of Texas Medicaid Fraud & Abuse Detection System • Situation Analysis: • About 25% of the Texas state budget goes to medical welfare programs • Estimated 10% of the $7.3B in Medicaid transactions are fraudulent • The previous system, Surveillance Utilization Revision Subsystem (SURS), detects only 14% of fraudulent providers • The Requirements • More accurate fraud detection and higher prosecution rates
State of Texas Welfare Fraud Management • The Business Intelligence Solution • ITC Fraud Spotlight data mining and fraud case management application • MineSet™ software for visualization • EDS System integration and management • SGI Origin 2000 high-performance IRIX server running Oracle 8i, ITC & MineSet™ Server • The Results • Detection rate of suspected doubled to 38% • Solution paid for itself in less than 6 months • The fraud detection application being leveraged into other state programs
Clustering and Data Visualization with MineSet™ • Data on 14,000 providers analyzed by unsupervised neural networks • Neural networks clustered providers based on 100+ columns • Visualization tool displays clustering, showing known fraudulent providers • Subset of 100 providers with similar patterns investigated: Hit rate > 70% Source: ITC 9-22-98
State of TexasWelfare Fraud Management • “These savings could help provide preventive care for several hundred thousand additional children” • - Robin Herskowitz, • Senior Policy Analyst with the Texas Office of the Comptroller of Public Accounts
MineSet™ Business Intelligence Solution Examples • Bioinformatics scientists are using MineSet visual and analytical data mining to gain insights and understanding into genetic data. • Business Intelligence Solutions Examples • Analysis of Gene Expression Chip Data at Roche • Visualizing Gel Electrophoresis Data at the University of Michigan • Visualizing Sequence Comparisons at the EBI • Predicting Splice Junction Points
Analysis of Gene Expression Chip Data at Roche • Researchers at Roche use SGI MineSet to analyze and understand gene expression chip data. • Visualization: Gene expression chip data displayed using MineSet Scatter Visualizer. Image Courtesy Roche Group.
Visualizing Gel Electrophoresis Data at the University of MI • Professor Philip Andrews & Peter Ulintz, Biological Chemistry Dept., are using the MineSet Splat Visualizer to view the large amount of electrophoresis data. • Visualization: Electrophoresis data displayed in 2-dimensions using MineSet Scatter Visualizer. Image courtesy the University of Michigan.
Visualizing Sequence Comparisons at the EBI • The European Bioinformatics Institute uses MineSet to visualize the partial results of a segment-wise "all-against-all" FASTA sequence comparison between two completed genomes. • Visualization: Genome comparisons using MineSet Map Visualizer. Image courtesy the EBI.
Predicting Splice Junction Points from the GenBank Primate DB • Using splice junction information, the MineSet Decision Tree classifier is used to predict and identify splice junction points in other unknown primate sequences • Visualization: Evidence Visualization of DNA Splice junction data.
Predicting Splice Junction Points from the GenBank Primate DB • Using splice junction information, the MineSet Evidence Classifier is used to predict and identify splice junction points in other unknown primate sequences • Visualization: Decision Tree Visualization of DNA Splice Junction Data.
Sample list of Business Intelligence Solutions using MineSet™ SGI Proprietary and Confidential
MineSet™ 3.1 Summary • Powerful Visual Data Mining • Visualizations launched from MineSet™ Clients or any Windows application • Insightful Analytic Data Mining • Classification, Regression, Association and Clustering data mining model development • Scalable Client/Server Architecture • Windows and SGI IRIX MineSet™ clients • NT/Linux MineSet™ Servers (Single Threaded 32 bit) • IRIX MineSet™ Servers (Parallel 64-bit) • Software Development Environment • APIs and plug-in interface and integration into other OLAP tools
MineSet™ 4.X Client Enhancements • Future Client Enhancements • Visualization Environment • 2D Visualizations • Data Mining Control Environment • Data Mining Project and Model management • Tool Manager Wizards for common Data Mining Tasks • Control API • Client API to provides access to all MineSet™ capabilities • Web enabling of server function • Enhanced configuration and launching of Data Visualization tools in WEB browser environments • Application Authoring Support
MineSet™ 4.X Server Enhancements • Future Server Enhancements • Data Mining Analytics • New Analytics • Time Sequenced Data (e.g., Association sequences) • Source Code export of MineSet™ Data Mining Models • Performance • Parallel Clustering • Parallel RDBMS, ODBC, Flat File, etc. to MineSet™ operations • Connectivity • Enhanced Direct RDBMS connectivity on NT, Linux and SGI IRIX MineSet™ Servers
MineSet™ Client/Server Platforms from sgi SGI Series 2000 SGI IRIX Servers SGI Series 1000 NT & LINUX Servers SGI