360 likes | 495 Views
Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach. Yiping Ke, James Cheng, Wilfred Ng. Presented By: Chibuike Muoh. Presentation Outline:. Contributions of the paper Introduction What are QCPs? Definitions Background Information Theory (entropy, MI, NMI)
E N D
Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach Yiping Ke, James Cheng, Wilfred Ng Presented By: Chibuike Muoh
Presentation Outline: • Contributions of the paper • Introduction • What are QCPs? • Definitions • Background Information Theory (entropy, MI, NMI) • Mining QCPs • All-confidence • Discretization problem (interval combining) • Attribute-level pruning • Interval-level pruning • QCoMine algorithm
Contributions of the paper • Presents a new algorithm for mining patterns on databases based on theory borrowed from information theory: entropy & mutual information • Achieves discretization of attribute domain using supervised interval combining to preserve dependency between attributes
Introduction • Similar to association rule mining in principle but evaluating for association rules can be too expensive on VLDBs • Trivial result-set {pregnant} {edema} & {pregnant, female} {edema} • Unproductive rules as a result of co-occurrence effects {pregnant, dataminer} {edema}. • So occupation and edema condition are related? • Unlike association mining, mining for QCP consider the dependency of the attribute sets of the database to generate highly correlated patterns • Similar to generating “maximal informative k-itemsets”, but here we consider dependency in the attribute sets
Introduction…contd. • The idea behind mining QCP • Evaluate the attribute set and look for ‘strong’ dependencies between attributes • Next find correlated interval sets in the dependent attributes and generate patterns from them • Thus, QCPs are not restricted by frequently co-occurring attributes
Definitions: Quantitative Database • A pattern X, is a set of attributes or random variable = {x1, x2, x3, …, xm} whose outcomes can be numerical or quantitative and have possibilities p(vx) = {p1, p2, p3, …, pm} • These attributes can be either categorical in which case domain of xi, dom(x) is in the interval {lx, ux} where lx = ux • And it is quantitative if where xi[lx, ux] is the interval of xi,lx <= ux • A pattern X is called a k-pattern if |attr(X)| = k • Consider a quantitative database, D, as a set of transactions, T. and transaction in D are a vector of items <v1, v2, v3, …, vm> where vi E dom(xi) for 1 <= I <= m.
Definition…contd. • So we say a transaction supports a pattern X if every attribute in X is represented in T • The frequency of a pattern X in D, freq(X), is the number of transactions in D that supports X • The support of X, supp(X) = freq(X)/|D| which is the probability a transaction T in D supports X
Example • The database table above consists of (6) attributes of which (3) are quantitative {age, salary, service years} and two are categorical {gender, married} • The last column records the support of each transaction • E.g. For pattern X = age[4, 5]gender[1,1], supp(X) = 0.25+0.19 = 0.44
Background: Information Theory • Mining QCPs makes use of fundamental concepts in information theory • Entropy: measures the information content/uncertainty of a random variable, x
Background: Information Theory…contd. • Mutual information (MI): measures the average reduction in uncertainty about a random variable X, given the knowledge of Y (or vice versa) • MI is a symmetric measure, so the greater the value of I(x; y), the more information x and y tell about each other.
Example • Consider the pattern X = (age;married) from Table 1, we can compute I(age,married) = The example above shows that age causes a reduction of 0.47 in the uncertainty of married Similarly as an exercise, we can compute I(gender;education) = 0.40
Normalized Mutual Information • But by how much does X actually tell us about Y? • Entropy of different attributes vary greatly, so MI only returns us an absolute value, which would not be so helpful in our case • We can try normalizing the MI among our set of attributes to get a global relative measure
where NMI…contd. • Normalizing the MI measure among the attribute sets returns us the minimum percentage of reduction in the uncertainty of one attribute given the knowledge of another
Example 2 • From the previous example we can compute • Also we can determine • Note that although I(age;married) > I(gender;education) its NMI is less this can be attributed to the high entropy value of H(age) = 2.19 > H(education) = 1.34 • This implies that a much larger absolute value of uncertainty can be reduce by knowing age than a relative amount.
Definition: Quantitative Pattern • A more formal definition of quantitative pattern X follows below: • Thus given a minimum threshold (μ) and minimum all-confidence threshold (ς), a quantitative pattern has strong co-dependency between attributes and high confidence level in the dataset
allconf(X) • All confidence is a correlation measure for determining the minimum confidence of association rules that can be derived from a given pattern. • For a quantitative pattern, allconf(X) is defined as: • This is different from association rule mining where conf(XY) only indicates an implication of sets on left to sets on right
allconf(X)…contd. • All confidence has the downward closure property thus a pattern has all-confidence no less than ς, so do all its sub-patterns
Example • allconf(X) = gender[1,1]education[1,1] Similarly allconf(gender[1,1]married[1,1]) = 0.9
allconf(X) • A caveat about allconf is that since it is applied at fine granularity to intervals of attributes it can’t solely be used as a measure for co-related patterns. • Quantitative attributes can span huge intervals creating a co-occurrence problem • The above, points explain the need to first perform pruning at attribute level Example For the given employee database in the previous example, we set μ= 0.2 and ς = 0.5. The pattern Y = gender[1,1]married[1,1] is not a QCP because Ί(gender,married)= 0 < μalthoughallconf(Y) = 0.9 this is because, gender & married are independent of each other, but then p(gender[1,1]) and p(married[1,1]) are very high
QCP Mining • Problem description: • Given a quantitative database, D, a minimum information threshold μ, and a minimum all-confidence threshold, ς, the mining problem is to find all QCPs from D
QCP Mining: Process Outline Quantitative Database QCoMine Algorithm Interval Combining/ Discretization Attribute pruning Interval pruning - Attribute pruning finds dependent attribute sets - Interval pruning generates correlated patterns
Interval Combining • When dealing with quantitative data, continuous attributes we need to discretize the intervals of the attribute. • Challenges • Preventing the intervals from being to trivial • Eg: age[0,2] vs age[0,0], age[1,1], age[2,2] • Considering the dependency of the attributes when combining their intervals • Example: the pattern (age,gender) can produce a different interval than (age,married)
Interval combining…contd. • Interval combing for quantitative patterns can be considered an optimization problem, for an objective function Φ : • Goal for this stage is: • Given two attributes x and y, where x is quantitative and y can be either quantitative or categorical we want to obtain the optimal combined intervals of x with respect to y. • Note that since this optimization is performed locally (btw. pairs of attribute) we use MI instead of NMI
To prevent the intervals from being to trivial a termination condition is set as minimum value for the interval specified • Let Φ[ix1,ix2](x,y) denote the value of Φ(x,y) when ix1 and ix2 are combined with respect to y • At each time, two consecutive intervals, ix1 & ix2 are considered for combination. The idea is to pick up at each time the maximum Φ [ix[j],ix[j+1]](x,y) among all pairs of consecutive intervals ix[j] and ix[j+1], and combine corresponding ix[j] and ix[j+1] into xj’ Interval combining: Algorithm.
Attribute level pruning • At this stage pruning at the attribute level is performed such that the attributes in a pattern have NMI of at least μ The above definition considers attribute patterns as vertices in a graph, and cliques in the graph represent QCPs
Attribute Level pruning…contd. • From the previous definition, QCP’s are cliques in the NMI-graph having NMI >= μ • Without pruning at the attribute level i.e. u=0 the search space for cliques in the graph becomes more complex • And enumerating for cliques in a graph can be an exhaustive process • Authors of the paper introduce a prefix tree structure for prefixing correlated attributes attribute prefix tree, Tattr • Clique enumeration in the NMI-graph is done using a the prefix tree • The only extra action required when enumerating cliques using the prefix tree is to check if (u,v) is an edge in the G
Prefix tree construction • To create the prefix tree • First a root node is created at level 0 of Tattr • Then at level 1 we create a node for each attribute I as a child of the root • For each node u at level k (k >= 1) and for each right sibling v of u, if (u,v) is an edge in G, we create a child node for u with the same attribute label as that of v • Repeat step 3 until for u’s children at level k+1 Step 3 of the prefix tree construction creates the prefix tree in a depth-first manner
Interval-level pruning • Even though the cliques found using the NMI-graph have high NMI they differ on the intervals of their continuous attributes • Since intervals are combined in a supervised way, the same attribute may have difference set of combined intervals with respect to different attributes • Thus patterns with low all-confidence may still be generated from correlated attributes • The Interval-level pruning process uses all-confidence to ensure that only high confidence patterns are generated from a pattern X and all its super-patterns • Follows from its downward closure property
Interval-level pruning…contd. • Note that an easy way to perform pruning at the interval level for a k+1 pattern, is to compute the intersection of the prefixing (k-1) intervals of the two k-patterns • Example Given age[30,40]married[1,1] and age[25,35]salary[2000,3000] intersect the intervals of age to obtain the new pattern age[30,35]married[1,1]salary[2000,3000] • However producing a new (k+1) pattern using intersection violates the downward closure property of all-confidence • Shrinking the intervals in the (k+1)-pattern may cause a great decrease in the support value of a single item so its all-confidence may be higher than its composite k-patterns
Interval-level pruning…contd. • We can avoid intersection in the interval pruning by enumerating all sub-intervals of a combined interval Sx and Sy of the attribute set {x,y} at level-2 of Tattr and prune at that level before generating a pattern • We need to consider all pairs of sub-intervals of x and y as each of them represents a pattern • Thus for each interval set {i’x, i’y}, where • We create a QCP X if x[i’x]y[i’x] if allconf(X) >= ς • This process of evaluating all possible sub-interval combinations at 2-patterns ensures down closure on all k-patterns generated from it
Step 2-4 constructs the NMI graph G and uses it to guide the construction of the attribute prefix tree Tattr to perform attribute pruning Steps 5-13 construct level-2 of Tattrand also perform interval pruning (steps 10-13) which produces all 2-pattern QCPs Twinterval is an interval-prefix tree, that keep the interval sets of all patterns generated by a node u in Tattrit is used as a memoization variable for speedup and space saving Steps 14-15 invoke RecurMine on the child nodes of u in G to generate all k-QCPs for k > 2 QCoMine Algorithm First combine the base intervals of each quantitative attribute with respect to another attribute
QCoMine Algorithm…contd. The steps in the RecurMine algorithm continue to build the prefix tree Tattr from k>2 Interval pruning is aided by using the interval-prefix tree to speed up joins of two k-patterns. At step 6 of the algorithm when two k-patterns are combined, it is ensured that all their prefixing (k-1)-intervals are the same in both patterns to prevent performing interval combining
Performance of QCoMine • Performance test of the QCoMine algorithm were performed to test the efficiency of its three major components • Supervised interval combining • Attribute-level pruning by NMI • Interval-level pruning by all confidence • Three-variants of the algorithm were created • QCoMine, which performs all operations as described originally in the paper • QCoMine-0, a control variant of the original algorithm which performs the interval combining process but sets μ=0 • QCoMine-1, is another control variant that does not perform interval combining process but utilizes μ as described originally in the paper • The tests were performed with all-confidence from ς = 60% to 100%
Performance of QCoMine…contd. When interval combining is not applied, results on the dataset can only be obtain when ς = 100%. In all other cases the algorithm will run out of memory. This is because QCoMine-1 is inefficient since it allows the interval of an item to become too trivial so patters would easily gain all-confidence > ς simply by co-occurrence.
Performance of QCoMine…contd. The running time for both QCoMine and QCoMine-0 increases only slightly for smaller ς this is because the majority of the time is spent on computing the 2-patterns. No matter the value of ς we need to test every 2-pattern to determine if it’s a QCP, before we can employ downward property of all-confidence to prune.
References • Mining quantitative correlated patterns using an information-theoretic approach, Y Ke, J Cheng, W Ng - Proceedings of the 12th ACM SIGKDD international conference 2006 • Discovering significant rules, GI Webb - Proceedings of the 12th ACM SIGKDD international conference 2006 • Maximally informative k-itemsets and their efficient discovery, AJ Knobbe, EKY Ho - Proceedings of the 12th ACM SIGKDD international conference 2006