80 likes | 234 Views
Data Mining Application: CART. CART:. Binary Recursion Decision Tree program from Salford Systeems www.salford-systems.com 30-day evaluation copy from http://www.salford-systems.com/evals/cartreg.html Company: Villanova University Department: Computer Sciences.
E N D
CART: • Binary Recursion Decision Tree program from Salford Systeems • www.salford-systems.com • 30-day evaluation copy from • http://www.salford-systems.com/evals/cartreg.html • Company: Villanova University • Department: Computer Sciences
CART Binary Recursive Trees • One target variable • Splits data into a number of classes on the target variable (set-able input parameter) • Many predictor variables • At each recursion CART determines one yes-no (binary) question based on one predictor variable • Various splitting criteria. Default (GINI) measures how well rule separates classes in parent node
CART Tutorial • We have defined three market segments, numbered 1, 2, 3. They represent “profitability”, broadly defined as “how much money did we make from this person in the last year”. • We are interested in questions which distinguish these segments so we know how to better target future marketing.
CART Gym Data Tutorial: Variables • SEGMENT Member's market segment (coded 1,2,or 3) • ANYRAQT Racquet ball usage (binary indicator coded 0, 1) • TANNING Number of visits to tanning salon • PERSTRN Personal trainer (binary indicator coded 0, 1) • ONAER Number of on-peak aerobics classes attended • OFFAER Number of off-peak aerobics classes attended • ANYPOOL Pool usage (binary indicator coded 0, 1) • CLASSES Number of classes taken • NSUPPS Number of supplements/vitamins/frozen dinners purchased • SMALLBUS Small business discount (binary indicator coded 0, 1) • OFFER Terms of offer • FIT Fitness score • NFAMMEN Number of family members • HOME Home ownership (binary indicator coded 0, 1)
Potential Data Sources • CART uses data in the Systat for Windows format, extension .syd. (Systat is a very popular statistical package) www.spssscience.com/systat. • The downloaded version includes a dynamic link to a program called DMBS-copy, which also allows you to use other data formats such as ASCII, Excel, etc. www.conceptual.com/dbmscopy.htm.
Summary: CART • Good for generating decision trees, and provides a lot of alternatives and a lot of information. • Can also use the rules created and the resulting data as input into additional tools • Far more information there than you want to look at if you don’t know what you’re looking for.
CART Assignment: • Three pieces: • Download and install it • Work through the tutorial yourself and do a brief report. • Analyze a new set of data and answer some questions about it. • I am in the process of getting descriptions for the sample data in the download and will prepare questions based on one of those • Or if you have data in an appropriate format you may use your own data and questions.