130 likes | 213 Views
Discovering and analyzing income determinants using decision trees. Krzysztof Karpio Piotr Łukasiewicz Arkadiusz Orłowski Tomasz Ząbkowski. Data . Households incomes Poland Years : 2000 – 2010 „Budżety G ospodarstw Domowych” - GUS About 36 000 households in each year
E N D
Discovering and analyzingincomedeterminantsusingdecisiontrees Krzysztof KarpioPiotr ŁukasiewiczArkadiusz OrłowskiTomasz Ząbkowski Warsaw University of Life Sciences - SGGW
Data • Householdsincomes • Poland • Years: 2000 – 2010 • „Budżety Gospodarstw Domowych” - GUS • About 36 000 households in eachyear • Householdincome / Number of earners • Real income (based on prices in 2008). Warsaw University of Life Sciences - SGGW
Conditionalatributes FEMALE MALEMean: 17.3 20.4 kPLN • Sex of a family head • Education of a family head • Age of a family head • Economicgroup of a household • Family type • Number of persons in a household • Number of children • Number of earners • Class of place of residence • Voivodeship VILLAGE – CITYMean: 16.6 26.3 kPLN PODKARPACKIE – MAZOWIECKIEMean: 14.7 23.6 kPLN Warsaw University of Life Sciences - SGGW
Incomes 2008 8 kPLN 16 kPLN 45 kPLN LOW 7% AVERAGE40% MODERATE48% HIGH 5% Warsaw University of Life Sciences - SGGW
Method • Decisiontree • Entropy • Gain Rudolf Clausius (1822 – 1888) Warsaw University of Life Sciences - SGGW
Attributestree 2008 at least a secondary marriedcouple pensioners Warsaw University of Life Sciences - SGGW
Treenodes and leaves Attributes 2000 - 2010 Education Family type Economicgroup Number of earners Class of place of residence Education Family type Economicgroup Number of earners Class of place of residence Not relevant 2000 - 2010 • Sex of a family head • Age • Number of persons • Number of children • Voivodeship. Warsaw University of Life Sciences - SGGW
Information Gain GAIN 0,01 Warsaw University of Life Sciences - SGGW
2-classes (high income) Warsaw University of Life Sciences - SGGW
2-classes (lowincome) Warsaw University of Life Sciences - SGGW
Efficiency of trees High income Lowincome Warsaw University of Life Sciences - SGGW
Summary • The most importantattribute: Education • HigherEducation (BA & MA) prefered • Importantattributes: Education, Family Type(marriage), EconomicGroup(pensioners), Resindence (big cities), Number of Earners(1 or 2) • Evolution of attributes (2000-2010) • Education - stable, the most important • Numer of Earners– decreasingimportance • EconomicGroup– increasingimportance • Family Type– the weakest but noticableimportance • Lack of relevance of: Sex, Age, Voivodeship to be continued ….. Warsaw University of Life Sciences - SGGW
ThankYou REFERENCES • Quinlan, J. R. „C4.5: Programs for Machine Learning”, Morgan Kaufmann, (1993) Los Altos • Kemal Polat, SalihGunes, „A novel hybrid intelligent method based on C4.5 decision treeclassifier and one-against-all approach for multi-classclassification problems”, Expert Systems with Applications 36 (2009) 1587 • . THANK YOU Warsaw University of Life Sciences - SGGW