180 likes | 270 Views
Mining Rules from Surveys and Questionnaires. Scott Burton and Richard Morris CS 676 Presentation 12 April 2011. Surveys and Questionnaires. Frequently Used Problems for data mining Rarity Related and dependent questions Ordinal / Likert scale. Association Rule Mining.
E N D
Mining Rules from Surveys and Questionnaires Scott Burton and Richard Morris CS 676 Presentation 12 April 2011
Surveys and Questionnaires • Frequently Used • Problems for data mining • Rarity • Related and dependent questions • Ordinal / Likert scale
Association Rule Mining Market basket analysis Cookies -> Milk
Our Goal: Improve Precision Standard Algorithms/Approaches • Apriori, MS-Apriori • Too many rules • Rules are not “interesting” or actionable • Finding the needle in the haystack Our goal • Improve Precision • How do you measure “interestingness?”
Interestingness Measures • Mostly based on Support or Confidence • Considered about 40 different metrics • All seemed to favor the wrong types of rules
Our Datasets • Smoking habits of middle school students in Mexico • Global Youth Tobacco Survey for the Pan American Health Organization (GYTSPAHO) • ~65 Questions and 13,000 responses • HINTS (Health Information National Trends Survey) • hints.cancer.gov • 2007 response data had ~475 Questions and 8,000 responses • We focused on a subset of ~100 questions
Apriori vs. MS-Apriori Apriori (Figure 1) MS-Apriori (Figure 2)
Related and Dependent Questions True but worthless rules • Do you smoke=no -> Did you smoke last week=no Our approach • Cluster similar questions • Remove any intra-cluster rules 1 2 3 7 4 8 9 5 6
Creating Clusters • Distance Metrics • Bi-conditional prediction • Attribute vs. Attribute-Value pair • Involving the subject matter expert
A Sample Clustering of Questions (see handout)
Effects of Cluster Pruning MS-Apriori (Figure 2) After cluster pruning (Figure 3)
Similar Rules Abstract Viewpoint: • A B -> C D • A -> C D • A B -> C • A B Z -> C D
Effects of Similar Rule Pruning After cluster pruning (Figure 3) After Similar Rule Pruning (Figure 4)
Ordinal and Likert Data Two Approaches • Pre-process • Post-process Likert Ordinal
Effects of Pre-Binning (Figure 5)
Other Examples • HINTS Data (see handout, Figures 6-10)
Conclusions and Future Work Conclusions • Increased precision of “interesting” rules • More work to be done Future work • Tuning of existing processes • Handle numerical data • Handle questions not asked to everyone • Handle questions with multiple responses • Try other record matching techniques for similar rule pruning