Overview of Methods

Overview of Methods Data mining techniques What techniques do, examples, Advantages & disadvantages

History • Statistics • AI: • genetic algorithms, neural networks • analogies with biology • memory-based reasoning • link analysis from graph theory

Techniques • Statistical • Market-Basket Analysis - find groups of items • Memory-Based Reasoning- case based • Cluster Detection - undirected (quantitative MBA) • Artificial Intelligence • Link Analysis - MCI’s Friends & Family • Decision Trees, Rule Induction - production rule • Neural Networks - automatic pattern detection • Genetic Algorithms- keep best parameters

Models • Regression: Y = a + bX • Classification: assign new record to class • Predictive: assign value to new record • Clustering: groups for data • Time-series: assign future value • Links: patterns in data

Fitting • Underfitting: not enough detail • leave out important variables • Overfitting: too much detail • memorizes training set, but doesn’t help with new data • data set too small • redundancy in data

Comparison of Features

Data Mining Functions • Classification • Identify categories in data • Prediction • Formula to predict future observations • Association • Rules using relationships among entities • Detection • Anomalies & irregularities (fraud detection)

Financial Applications

Telecom Applications

Marketing Applications

Web Applications

Other Applications

Data Sets • Loan Applications • classification • Job Applications • classification • Insurance Fraud • detection • Expenditure Data • prediction

Loan Data • 650 observations • OUTCOMES (binary): • On-time cost of error: $300 • Late (default) cost of error: $2,000 • Variables • Age, Income, Assets, Debts, Want, Credit • Credit ordinal • Transform: Assets, Debts, & Want →Risk

Job Application Data • 500 observations • OUTCOMES (ordinal): • Unacceptable • Minimal • Acceptable • Excellent • Variables • Age, State, Degree, Major, Experience • State nominal; degree & major ordinal • State is superfluous

Insurance Claim Data • 5000 observations • OUTCOMES (binary): • OK cost of error $500 • Fraudulent cost of error $2,500 • Variables • Age, Gender, Claim, Tickets, Prior claims, Attorney • Gender & attorney nominal, tickets & prior claims categorical

Expenditure Data • 10,000 observations • OUTCOMES: • Could predict response in a number of categories • Others • Variables: • Age, Gender, Marital, Dependents, Income, Job years, Town years, Education years, Drivers license, Own home, Number of credit cards • Churn, proportion of income spent on seven categories

Overview of Methods

Overview of Methods

Presentation Transcript

Overview of Contraceptive Methods

Overview of Research Methods

Overview of Kernel Methods

Overview of Microbiology Methods

Overview of Sampling Methods I

Overview of Biostatistical Methods

OVERVIEW OF RESEARCH METHODS

OVERVIEW OF RESEARCH METHODS

Microscopy: Overview of Different Methods

Overview of Haze Removal Methods

Survey Results / Overview of Methods

Overview of Evaluation Methods

Overview of Alternative Assessment Methods

Overview of Astro Methods

Overview of Foresight Methods

OVERVIEW OF METHODS

Microscopy: Overview of Different Methods

Methods of Allocating Costs -Overview

Overview of Evaluation Methods

Methods of Proof: Overview

OVERVIEW OF METHODS

OVERVIEW OF RESEARCH METHODS