1 / 27

Data Mining I: KnowledgeSEEKER

Data Mining I: KnowledgeSEEKER. Jennifer Davis Kelly Davis Saurabh Gupta Chris Mathews Shantea Stanford. Overview of Presentation. Introduction to Data Mining Methods and Products Tutorial: How to Use KnowledgeSEEKER? Exercises: How much did you learn?. What is Data Mining?.

derora
Download Presentation

Data Mining I: KnowledgeSEEKER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining I: KnowledgeSEEKER Jennifer Davis Kelly Davis Saurabh Gupta Chris Mathews Shantea Stanford

  2. Overview of Presentation • Introduction to Data Mining Methods and Products • Tutorial: How to Use KnowledgeSEEKER? • Exercises: How much did you learn?

  3. What is Data Mining? • Filtering large amounts of data • Searching for hidden patterns and/or trends • Predicting future results • Creating a competitive advantage and improving decision making • Data mining is a form of artificial intelligence, but is very different from other BI tools. • Discovery versus Verification

  4. What Sparked Data Mining? • “Motivated by business need, large amounts of available data, and humans’ limited cognitive processing abilities • Enabled by data warehousing, parallel processing, and data mining algorithms” Source: Dr. Hugh Watson

  5. Popular Data Mining Methods • Neural networks – learning from data patterns and predicting new data • Genetic Algorithms – optimizing techniques • Decision trees – rules for classifying data • Regression Analysis - statistical • K-nearest neighbor – classifying and clustering technique based on weighting of selected variables • Data Visualization – visually showing patterns

  6. Types of Data Mining • Association – identifies relationships • Sequential pattern – identifies sequencing • Classifying – identifies potential outcomes forpredetermined categories • Clustering – identifies categories • Prediction – estimatesfuture values or forecasts

  7. Data Mining Process • “Requires personnel with domain, data warehousing, and data mining expertise • Requires data selection, data extraction, data cleansing, and data transformation • Most data mining tools work with highly granular flat files • Is an iterative and interactive process” Source: Dr. Hugh Watson

  8. How Data Mining Is Used? • CRM: Research, churn and promotional management. • Process Mgmt: Reduce operational delays. • Analysis: Develop forecasting models and fraud prevention. • Predictive Capabilities: Develop rules for queries or expert systems and oil exploration. • Health Care: Medical research and trends. • Banking: Identify bank locations. • Sports: Guide movement of players.

  9. Data Mining Products • See product list, http://www.xore.com/prodtable.html • According to Jackie Sweeney, International Data Corporation, “Data mining has matured, producing fortunes for the Big Three vendors - SPSS, IBM and SAS Institute - and robust revenues for a number of smaller vendors who market solutions tailored to vertical markets.”

  10. Data Mining Products • Off-the-shelf applications and bundling are becoming more common. • Wide range of pricing • SAS Institute’s Enterprise Miner ~ $80k • IBM Intelligent Miner ~ $60k • Angoss KnowledgeSEEKER = $4,750 per license, including upgrades and unlimited tech support for 1 year. Annual license renewal fees are 20% of the list price. • Desktop products start at few hundred dollars

  11. Selection Process – Questions to Ask? • Are the data and variables currently available? • Will mining involve numerical and nominal data? • Can the tool build models, predict outcomes and verify results? • Can it process the amount of data required? • Can the tool handle incomplete data? • Can the tool process noisy data? • Can it provide the degree of granularity desired? • How much technical knowledge is required?

  12. KnowledgeSEEKER by Angoss • Angoss Software Corp = Canadian public company specializing in data mining solutions • Decision tree modeling • Fully scalable and easy to use • Specifications • Operating Systems: Unix, Windows 3.1, 95, 98 and NT. • Databases: Access, dBase II, III and IV, ODBC, SAS, SPSS.

  13. Users of KnowledgeSEEKER • IRS – fraud detection • University of Rochester – Cancer research • Hewlett Packard – process and quality control • Readers’ Digest – market segmentation • MGM Grand – survey analysis

  14. Sources • Angoss Whitepaper: http://www.angoss.com/ProdServ/ AnalyticalTools/kseeker/whitepaper.html • “Data Mining for Golden Opportunities”, Smart Computing, January 2000 • “Your Business Intelligence Arsenal”, Telephony, ChicagoApr 24, 2000, Douglas Hackney • Examples and testimonials: http://www.data-mining-software.com/data_mining_examples.htm • Data Management, Richard T. Watson, 2002 • http://www.xore.com/prodtable.html (Data Mining Products) • Dr. Hugh Watson’s slide • “Data Mining Gets Real”, Enterprise Systems Journal,April 1999, Jon William Toigo • http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm (examples of Data Mining uses)

  15. KnowledgeSEEKER Tutorial

  16. KnowledgeSEEKER Exercises • According to KnowledgeSeeker, which is the most important variable influencing hypertension for those between the ages of 51-62 who are “regular” or “occasional” smokers?  Answer - Cheese Last Week

  17. KnowledgeSEEKER Exercises • What is the total number of 51-62 year olds who have identified themselves as “former/never smokers” and have an eating pattern that includes “a lot/moderate salt?” Answer – 32

  18. KnowledgeSEEKER Exercises • What percent of women between the ages of 32-50 who occasionally drink have high hypertension?  Answer - 28.6%

  19. KnowledgeSEEKER Exercises • What is the percent of people in income group 4,5,7, and 8, age bracket 32-50, who have high hypertension? Answer - 11.8%

  20. KnowledgeSEEKER Exercises • In the sample data, how many people have never smoked before?  Answer - 94

  21. KnowledgeSEEKER Exercises • What is the most important factor contributing to hypertension according to KnowledgeSeeker for those in the 51-62 age bracket? Answer - Smoking Next by right clicking and selecting “Go to Split” find the 4th most important factor from the table.   Answer - Deep fried last week

  22. KnowledgeSEEKER Exercises • What is the percentage of males who are “regular” smokers among all male participants?  Answer - 30.8%

  23. KnowledgeSEEKER Exercises • Create a graph of the distribution of smoking males.

  24. KnowledgeSEEKER Exercises • Complete the following steps: Dependent variable – Hypertension Click on Grow / Automatic What is the total number of males between the ages of 63-72 who had fish last week? Answer – 24

  25. KnowledgeSEEKER Exercises • What is the next split after age that has the highest effect on hypertension according to KnowledgeSeeker?  Answer - Height

  26. KnowledgeSEEKER Exercises • Among 32-50 year olds who report a drink pattern of former/never, how many have high hypertension?  Answer - 0

  27. KnowledgeSEEKER Exercises • According to KnowledgeSeeker, what is the most important variable influencing hypertension for women between the ages of 51-62? How is this different from males age 51-62? Women – weight Men - drinking pattern

More Related