210 likes | 301 Views
Interesting Association Rules of Household Indicators of Poverty. By Nkumbuludzi Ndwapi. Contents. Introduction Literature Review Methodology Analysis Conclusions & Recommendations. Introduction. One of the major problems troubling the continent of Africa is poverty.
E N D
Interesting Association Rules of Household Indicators of Poverty By Nkumbuludzi Ndwapi
Contents • Introduction • Literature Review • Methodology • Analysis • Conclusions & Recommendations
Introduction • One of the major problems troubling the continent of Africa is poverty. • Poverty is the state of which one lacks the means to satisfy their basic needs. • Defining poverty often leads to the question of how to determine when a household is to be considered poor.
Introduction cont’d • There are various methods in which poverty can be measured; • income poverty • human poverty • capabilities deprivation. • Coleman and Cressay (1990) explained that measurement of income poverty are • Absolute • Relative
Introduction cont’d • The above definitions define the multi complexity of poverty. • Mining Association Rules can be used to unravel how certain aspects of human poverty are associated or related.
Statement of Problem • Poverty alleviation has been an issue of major concern to the Government of Botswana. • According to Buthali (1997) the government took the decision to focus on the productive mining sector. • The redistribution of this revenue is based on understanding a poor household.
Statement of Problem cont’d • To characterise a household as poor should not only be based on economic characteristics. • According to Buthali (1997) the analysis of poverty in Botswana has been solely based on poverty baskets and poverty datum lines over the years. • The Human Development programme (1997) reported that figures based on poverty lines are usually converted to dollars and this distorts the real levels of inflation. • Purchasing Power Parity exchange rates that are used to turn the $1/day poverty line into national currencies are inappropriate.
Objectives • To develop association measures that could be used to analyse the multi-factored nature of poverty • To determine the most common (frequent) types of housing and living conditions based on the HIES 2002/3 data set. • To investigate interesting association measures that exist between different housing and living conditions which can not be determined using traditional statistics techniques • To classify households using interesting rules as to whether they are poor or not.
Literature Review • Income is limited as an indicator of poverty because it does not capture public goods, non market goods, rationing, and the problem of distorted or imperfect markets (Alkire and Leander, 2005) • So income as the sole indictor of well being is inappropriate and should be supplemented by other attributes or variables for example housing, literacy, life expectancy, provision of public goods and so on. • Alkire and Leander (2006) explain that multidimensionality was also advocated by the basic needs approach as Sens (1997) capability approach argued that wellbeing is multidimensional.
Literature Review cont’d • Individuals and households that are able to meet their basic food needs but are unable to provide adequately for basic none food needs would still be classified as poor based on this criteria (Obuseng and Powder; 2003). • A research by Ngwame et al. (2002) revealed that, as with deprivation in terms of dwelling structure, lack of access to safe water in South Africa is common among the poor. • Ngwame et al. (2002) it is explain that there is often a relationship between poor housing and poor sanitation facilities
Methodology • The data • Obtained from the HIES 2003 • Original variable s and their categories were as in table 1. • General Categorization • Categorization of Indicators of poverty • Indicators of poverty as in table 2.
Mining Association Rules • Mining Association Rules is a branch of data mining that seeks to understand the shopping behavior of supermarket customers • which items tend to be bought together? • Which items are bought as substitutes? • Agrawal, Imiellinski and Swami (1993) are widely credited with introducing the method of Association Rules at the 1993 International Conference on Management of Data, held in Washington DC, USA. • In the intervening 15 years, “mining association rules” has become one of the best studied problems of data mining. • As with other data mining techniques, association rules (AR) have mainly been used in market research studies. • MAR aims to answer questions such as how often does a shopping basket that contains meat also contain wine? • MAR is also referred to as unsupervised learning.
Mining Association Rules cont’d • Searching the database for interesting patterns is an unsupervised learning process. • Suppose that X and Y represent 2 different sets of potential items in a shopping basket, e.g. X = {Meat, Maize meal} and Y = {spinach} • Then an association rule between X and Y is a rule, r: X →Y. • The rule r, is interpreted in this case as meaning that • customers who buy Meat, Maize meal are likely to buy spinach with a certain probability. • With unsupervised learning, one must then define conditions that make such a rule to be of interest.
Interesting Measures • The rule r: X→Y makes no prediction of Y in the entire database. • In the original paper, Agrawal et al. (1993) introduced the support-confidence framework, and the apriori algorithm for mining the rules. • Hahsler (2005) gives a comprehensive summary of commonly used and recent additions to measures of interestingness of mined association rules. • Hahsler et al (2007) provide additional probability based measures of interestingness as well as an R-library, arules, for mining association rules, that utilizes two of the most popular algorithms - apriori and eclat
Interesting Measures cont’d • The support of a rule, r: X →Y is the likelihood of finding a transaction containing all items in X and Y in the database. Estimated by • proportion of transactions in which X & Y are both present. • The confidence of a rule r: X →Y is the likelihood of finding Y among all transactions that contain X. • proportion of transactions containing X and Y among transactions containing X. • This is a conditional probability: Given that a transaction contains X, what is the likelihood that it also contains Y? • Lift (initially called interest) of r: X →Y compares the likelihood of finding both X and Y in a transaction to the probability of their joint occurrence if they occurred independently – i.e. if costumers bought the two itemsets independently.
Interesting Measures cont’d • "chiSquare" (see Liu et al. 1999). The chi-square statistic to test for independence between the LHS (X) and RHS (Y) of the rule. The critical value of the chi-square distribution with 1 degree of freedom (2x2 contingency table) at alpha=0.05 is 3.84; higher chi-square values indicate that the LHS (X) and the RHS (Y) are not independent. • "oddsRatio" (see Tan et al. 2004). The odds of finding X in transactions which contain Y divided by the odds of finding X in transactions which do not contain Y. Range: 0...1... Inf ( 1 indicates that Y is not associated to X).
Interesting Measures cont’d • Consider the following transactional matrix • The various measures of interestingness can be computed as follows ; equations.
Analysis • Descriptive statistics • Mining Association Rules • General Rules • Rules of Poverty • Cluster Analysis
Conclusions • Considering the rules presented above mining association rules is technique that could be used to analyse household poverty and how certain household characteristics relate. • For mining association rules to bring out the best rules surveys that are deliberately intended to capture the poor must be explored. • Rules resulting from such surveys will be the best at defining and explain household poverty.
Further research and Limitations • For further research ordinal regression model is proposed based on the clusters presented in this paper and the possibility of clustering the rules them selves is worth exploring. • Considering the results of the paper the objectives of the research have been satisfied. • The limitations of this paper is that it does not take into account different localities of household, indicators of poverty in the urban and rural areas are expected to be different.