1 / 16

DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES

DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES. Emily Thomas Director of Planning and Institutional Research. WHAT IS DATA MINING?. Data mining is the discovery of hidden knowledge, unexpected patterns and new rules in large databases.

mikel
Download Presentation

DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA MINING: DEFINITIONS AND DECISION TREE EXAMPLES Emily Thomas Director of Planning and Institutional Research

  2. WHAT IS DATA MINING? • Data mining is the discovery of hidden knowledge, unexpected patterns and new rules in large databases. • Data mining is exploratory. The results lack the protection from spurious conclusions that validates theory-based hypothesis-driven statistics.

  3. WHY USE DATA MINING? In the corporate world: • Large amounts of data are captured in enterprise data bases. • These databases are too large for traditional statistical techniques. • Identifying patterns in the data can target profitable, or unprofitable, customers.

  4. WHY USE DATA MINING? In institutional research: • Large numbers of variables • We have insufficient time/resources to investigate all the relationships that might be informative. • Identifying data patterns can shed light on student behavior.

  5. WHY DATA MINING NOW? • Development of large, integrated enterprise databases • Development of data mining techniques and software • Development of simplified user interface

  6. Decision trees Rule induction Nearest neighbors Neural networks Clustering Genetic algorithms Exploratory factor analysis Stepwise regression DATA MINING TECHNIQUES

  7. DECISION TREE ANALYSIS CHAID: Chi-squared Automatic Interaction Detector (SPSS Answer Tree) • Select significant independent variables • Identify category groupings or interval breaks to create groups most different with respect to the dependent variable • Select as the primary independent variable the one identifying groups with the most different values of the dependent variable • Select additional variables to extend each branch if there are further significant differences

  8. TRANSFER RETENTION RATES Percent of new full-time Fall 2002 transfers returning in Spring 2003

  9. TRANSFER RETENTION RATES FALL 2002-SPRING 2003

  10. SOS 2000: SATISFACTION WITH THE QUALITY OF EDUCATION

  11. VERY LARGE INTELLECTUAL GROWTH 19% of students

  12. LARGE INTELLECTUAL GROWTH 41% of students

  13. LOW OR MODERATE INTELLECTUAL GROWTH 40% of students

  14. SOS 2000: SATISFACTION WITH “THIS COLLEGE IN GENERAL”

  15. DECISION TREEADVANTAGES AND DISADVANTAGES • Discover unexpected relationships • Identify subgroup differences • Use categorical or continuous data • Accommodate missing data • Possibly spurious relationships • Presentation difficulties

  16. BIBLIOGRAPHY • AnswerTree 2.0: User’s Guide. SPSS, 1998. • Adriaans, P and D Zantinge (1996). Data Mining. Harlow, England and elsewhere: Addison-Wesley. • Bordon, VMH (1995). Segmenting Student Markets with a Student Satisfaction and Priorities Survey. Research in Higher Education 16:2, 115-138. • Neville, PG. (1999). “Decision Trees for Predictive Modeling,” SAS Technical Report, The SAS Institute. • Thomas, EH and N Galambos. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision Tree Analysis. Forthcoming in Research in Higher Education, May 2004.

More Related