1 / 20

DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos

DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos. DONOR_RAW: Data Description. Some slide material taken from: SAS Education Original artwork © Nick Evangelopoulos, 2013. DonorRAW: Data Overview.

ursa
Download Presentation

DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DSCI 4520/5240: Data MiningFall 2013 – Dr. Nick Evangelopoulos DONOR_RAW: Data Description Some slide material taken from: SAS Education Original artwork © Nick Evangelopoulos, 2013

  2. DonorRAW: Data Overview • Determine who is likely to donate to a non-profit organization campaign and target them for donation solicitation • The scenario is the same as the one that produced the data set MYRAW • This time we have somewhat different data

  3. DONOR_RAW: Nonprofit donation solicitation scenario • In 1997, a non-profit organization related to U.S. military veterans had a regular donation solicitation campaign called 97NK. For each person targeted by the campaign, certain information (at a personal or at a demographic level) was known beforehand. Solicitation response (whether they donated and, if yes, what amount) was recorded. • In 1998, the organization offered the full dataset to analysts (under certain conditions). The particular data set DONOR_RAW is a subset that includes 50 variables and about 19,400 observations.

  4. The Charity Donation Project A National veterans’ organization Business: From population of lapsing donors, identify individuals worth continued solicitation. Objective: Source: 1998 KDD-Cup Competition via UCI KDD Archive

  5. Data Preparation Donor Master Demographics Raw Analysis Data 95,412 Records 481 Fields Transaction Detail

  6. Additional Data Preparation Final Analysis Data (DONOR_RAW) Raw Analysis Data 19,372 Records 50 Fields 95,412 Records 481 Fields

  7. Analysis Data Definition Donor master data CONTROL_NUMBER Unique Donor ID MONTHS_SINCE_ORIGIN Elapsed time since first donation IN_HOUSE 1=Given to In House program, 0=Not In House donor

  8. Analysis Data Definition Demographic and other overlay data OVERLAY_SOURCE M=Metromail, P=Polk, B=both DONOR_AGE Age as of June 1997 DONOR_GENDER Actual or inferred gender PUBLISHED_PHONE Published telephone listing HOME_OWNER H=homeowner, U=unknown MOR_HIT Mail order response hit rate

  9. Analysis Data Definition Demographic and other overlay data CLUSTER_CODE 54 Socio-economic cluster codes SES 5 Socio-economic cluster codes INCOME_GROUP 7 income group levels MED_HOUSEHOLD_INCOME Median income in $100’s PER_CAPITA_INCOME Income per capita in dollars WEALTH_RATING 10 wealth rating groups

  10. Analysis Data Definition Demographic and other overlay data MED_HOME_VALUE Median home value in $100’s PCT_OWNER_OCCUPIED Percent owner occupied housing URBANICITY U=urban, C=city, S=suburban, T=town, R=rural, ?=unknown

  11. Analysis Data Definition Census overlay data PCT_MALE_MILITARY Percent male military in block PCT_MALE_VETERANS Percent male veterans in block PCT_VIETNAM_VETERANS Percent Vietnam veterans in block PCT_WWII_VETERANS Percent WWII veterans in block

  12. Analysis Data Definition Transaction detail data NUMBER_PROM_12 Number promotions last 12 mos. CARD_PROM_12 Number card promotions last 12 mos. 97NK Time `94 `95 `96 `97 `98

  13. Analysis Data Definition Transaction detail data FREQ_STATUS_97NK Frequency status, June `97 RECENCY_STATUS_96NK Recency status, June `96 MONTHS_SINCE_LAST Months since last donation LAST_GIFT_AMT Amount of most recent donation 96NK 97NK Time `94 `95 `96 `97 `98

  14. Analysis Data Definition RECENT transaction detail data RESPONSE_PROP Response proportion since June `94 RESPONSE_COUNT Response count since June `94 AVG_GIFT_AMT Average gift amount since June `94 RECENT_STAR_STATUS STAR (1, 0) status since June `94 94NK 96NK Time `94 `95 `96 `97 `98

  15. Analysis Data Definition RECENT transaction detail data CARD_RESPONSE_PROP Response proportion since June `94 CARD_RESPONSE_COUNT Response count since June `94 CARD_AVG_GIFT_AMT Average gift amount since June `94 94NK 96NK Time `94 `95 `96 `97 `98

  16. Analysis Data Definition LIFETIME transaction detail data PROM Total number promotions ever GIFT_COUNT Total number donations ever AVG_GIFT_AMT Overall average gift amount PEP_STAR STAR status ever (1=yes, 0=no) 94NK 96NK Time `94 `95 `96 `97 `98

  17. Analysis Data Definition LIFETIME transaction detail data GIFT_AMOUNT Total gift amount ever GIFT_COUNT Total number donations ever MAX_GIFT Maximum gift amount GIFT_RANGE Maximum less minimum gift amount 94NK 96NK Time `94 `95 `96 `97 `98

  18. Analysis Data Definition KDD supplied LIFETIME transaction detail data FILE_AVG_GIFT Average gift from raw data FILE_CARD_GIFT Average card gift raw data MONTHS_SINCE_FIRST First donation date from June `97 MONTHS_SINCE_LAST Last donation date from June `97 94NK 96NK Time `94 `95 `96 `97 `98

  19. Analysis Data Definition Transaction detail data target definition TARGET_B Response to 97NK solicitation (1=yes 0=no) TARGET_D Response amount to 97NK solicitation (missing if no response) 97NK Time `94 `95 `96 `97 `98

  20. PR1 assignment • Use the DONOR_RAW data table, found in the C:\4520data folder of the SAS EM5.3 server • Follow similar analysis steps to those shown in the Getting Started with SAS Enterprise Miner 5.3 text, pp.23-44, to start a new analysis project and make a Data Source called DONOR_RAW available. • Then follow pp. 45-60. Generate descriptive statistics (pp. 46-51), create exploratory plots (pp. 51-53), partition the raw data (pp. 54-55), explore missing values (pp. 55-58), and replace observations with unknown levels (pp. 58-60). Handout PR1 lists exactly what you need to turn in.

More Related