200 likes | 487 Views
DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos. DONOR_RAW: Data Description. Some slide material taken from: SAS Education Original artwork © Nick Evangelopoulos, 2013. DonorRAW: Data Overview.
E N D
DSCI 4520/5240: Data MiningFall 2013 – Dr. Nick Evangelopoulos DONOR_RAW: Data Description Some slide material taken from: SAS Education Original artwork © Nick Evangelopoulos, 2013
DonorRAW: Data Overview • Determine who is likely to donate to a non-profit organization campaign and target them for donation solicitation • The scenario is the same as the one that produced the data set MYRAW • This time we have somewhat different data
DONOR_RAW: Nonprofit donation solicitation scenario • In 1997, a non-profit organization related to U.S. military veterans had a regular donation solicitation campaign called 97NK. For each person targeted by the campaign, certain information (at a personal or at a demographic level) was known beforehand. Solicitation response (whether they donated and, if yes, what amount) was recorded. • In 1998, the organization offered the full dataset to analysts (under certain conditions). The particular data set DONOR_RAW is a subset that includes 50 variables and about 19,400 observations.
The Charity Donation Project A National veterans’ organization Business: From population of lapsing donors, identify individuals worth continued solicitation. Objective: Source: 1998 KDD-Cup Competition via UCI KDD Archive
Data Preparation Donor Master Demographics Raw Analysis Data 95,412 Records 481 Fields Transaction Detail
Additional Data Preparation Final Analysis Data (DONOR_RAW) Raw Analysis Data 19,372 Records 50 Fields 95,412 Records 481 Fields
Analysis Data Definition Donor master data CONTROL_NUMBER Unique Donor ID MONTHS_SINCE_ORIGIN Elapsed time since first donation IN_HOUSE 1=Given to In House program, 0=Not In House donor
Analysis Data Definition Demographic and other overlay data OVERLAY_SOURCE M=Metromail, P=Polk, B=both DONOR_AGE Age as of June 1997 DONOR_GENDER Actual or inferred gender PUBLISHED_PHONE Published telephone listing HOME_OWNER H=homeowner, U=unknown MOR_HIT Mail order response hit rate
Analysis Data Definition Demographic and other overlay data CLUSTER_CODE 54 Socio-economic cluster codes SES 5 Socio-economic cluster codes INCOME_GROUP 7 income group levels MED_HOUSEHOLD_INCOME Median income in $100’s PER_CAPITA_INCOME Income per capita in dollars WEALTH_RATING 10 wealth rating groups
Analysis Data Definition Demographic and other overlay data MED_HOME_VALUE Median home value in $100’s PCT_OWNER_OCCUPIED Percent owner occupied housing URBANICITY U=urban, C=city, S=suburban, T=town, R=rural, ?=unknown
Analysis Data Definition Census overlay data PCT_MALE_MILITARY Percent male military in block PCT_MALE_VETERANS Percent male veterans in block PCT_VIETNAM_VETERANS Percent Vietnam veterans in block PCT_WWII_VETERANS Percent WWII veterans in block
Analysis Data Definition Transaction detail data NUMBER_PROM_12 Number promotions last 12 mos. CARD_PROM_12 Number card promotions last 12 mos. 97NK Time `94 `95 `96 `97 `98
Analysis Data Definition Transaction detail data FREQ_STATUS_97NK Frequency status, June `97 RECENCY_STATUS_96NK Recency status, June `96 MONTHS_SINCE_LAST Months since last donation LAST_GIFT_AMT Amount of most recent donation 96NK 97NK Time `94 `95 `96 `97 `98
Analysis Data Definition RECENT transaction detail data RESPONSE_PROP Response proportion since June `94 RESPONSE_COUNT Response count since June `94 AVG_GIFT_AMT Average gift amount since June `94 RECENT_STAR_STATUS STAR (1, 0) status since June `94 94NK 96NK Time `94 `95 `96 `97 `98
Analysis Data Definition RECENT transaction detail data CARD_RESPONSE_PROP Response proportion since June `94 CARD_RESPONSE_COUNT Response count since June `94 CARD_AVG_GIFT_AMT Average gift amount since June `94 94NK 96NK Time `94 `95 `96 `97 `98
Analysis Data Definition LIFETIME transaction detail data PROM Total number promotions ever GIFT_COUNT Total number donations ever AVG_GIFT_AMT Overall average gift amount PEP_STAR STAR status ever (1=yes, 0=no) 94NK 96NK Time `94 `95 `96 `97 `98
Analysis Data Definition LIFETIME transaction detail data GIFT_AMOUNT Total gift amount ever GIFT_COUNT Total number donations ever MAX_GIFT Maximum gift amount GIFT_RANGE Maximum less minimum gift amount 94NK 96NK Time `94 `95 `96 `97 `98
Analysis Data Definition KDD supplied LIFETIME transaction detail data FILE_AVG_GIFT Average gift from raw data FILE_CARD_GIFT Average card gift raw data MONTHS_SINCE_FIRST First donation date from June `97 MONTHS_SINCE_LAST Last donation date from June `97 94NK 96NK Time `94 `95 `96 `97 `98
Analysis Data Definition Transaction detail data target definition TARGET_B Response to 97NK solicitation (1=yes 0=no) TARGET_D Response amount to 97NK solicitation (missing if no response) 97NK Time `94 `95 `96 `97 `98
PR1 assignment • Use the DONOR_RAW data table, found in the C:\4520data folder of the SAS EM5.3 server • Follow similar analysis steps to those shown in the Getting Started with SAS Enterprise Miner 5.3 text, pp.23-44, to start a new analysis project and make a Data Source called DONOR_RAW available. • Then follow pp. 45-60. Generate descriptive statistics (pp. 46-51), create exploratory plots (pp. 51-53), partition the raw data (pp. 54-55), explore missing values (pp. 55-58), and replace observations with unknown levels (pp. 58-60). Handout PR1 lists exactly what you need to turn in.