1 / 22

Arif Djunaidy Rully Soelaiman Daning Tyaspamadya

MINING ASSOCIATION RULES FROM LARGE DATABASES USING THE LATTICE-BASED APPROACH AND HYBRID SEARCH METHOD. Arif Djunaidy Rully Soelaiman Daning Tyaspamadya. Faculty of Information Technology ITS - Surabaya. Background - 1.

salim
Download Presentation

Arif Djunaidy Rully Soelaiman Daning Tyaspamadya

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MINING ASSOCIATION RULES FROM LARGE DATABASES USING THE LATTICE-BASED APPROACH AND HYBRID SEARCH METHOD Arif DjunaidyRully SoelaimanDaning Tyaspamadya Faculty of Information Technology ITS - Surabaya

  2. Background - 1 • In data mining, association rules represent relationships that may exist among items in their transactional data­bases • Since, the association rules that can be exploited may represent the customers’ behavior, identification of the frequent itemsets and the formation of the conditional implication rules among items are paramount important to perform • Efficient algorithms capable of optimizing those overheads in mining meaningful association rules are therefore required • However, for large databases, the extraction of a set of meaningful association rules may require substantial memory and database scanning that may in turn increase the overall computing time of the mining process

  3. Background - 2 • The task of discovering all frequent associations in very large databases is quite challenging • The search space is exponential in the number of database attributes • With millions of database objects, the problem of I/O minimization becomes paramount • Most current approaches are iterative in nature, requiring multiple database scans • Most approaches use very complicated data internal data structures, which have poor locality and add additional space and computation overheads

  4. Key Features of Our Approach • All frequent itemsets are enumerated via simple “tid-list” intersections • A lattice-theoretic approach is used to decompose the original search space (lattice) into smaller pieces (sub-lattices) that can be processed independently and easier • The hybrid search strategy for enumerating the frequent itemsets within each sub-lattice • Our approach is designed to involve only a few database scans to minimize the I/O costs

  5. Problem Statement - 1 • An association rule can be written as A  B, where • A is an itemset called the antecedentor left-hand side(LHS), and • B is an itemset called the consequent or right-hand side (RHS) • The association mining task is to discover a set of association rules among a large number of objects in a given database

  6. Problem Statement - 2 • The basic and fundamental task of the mining association rules application is to generate all association rules X  Y (X, Y are itemsets) that can be extracted from the database. These rules must satisfy both the support and confidence constraints • Support constraint : Sup (XY), • Confidence constraint: Sup (XY) / Sup (X) • Sup(X), is defined as the number of transactions in which it occurs as a subset • An itemset is categorized as a frequent itemset if its support is more than a minimum support (MinSup) supplied by a user • The confidence factor represents the conditional probability that a transaction contains Y (given that the transaction contains X) • An association rule is said to be confident if its confidence factor value is more than the minimum confidence (MinCof) supplied by the user.

  7. Simple Example - 1 • Consider the sales database of food store, where the objects represent customers and itemsets represent food • In this example, the discovered patterns are the set of food frequently bought together by the customers. • An example pattern found could be that, “60 percent of the customers who buy cereal also buy milk” • The store can then use this knowledge for shelf placement, controlling the stock, etc. • There are many potential application areas for association rule technology, which include catalog design, customer segmentation, store layout, and so on

  8. Simple Example - 2 MinSup = 50% MinCof = 100%

  9. The Lattice-Based Approach - 1 • We use the “Lattice-Theoretic” to: • Identify all frequent itemsets • Count the “support” of association rules • Pre-req: Construct the “tid-list” from the transaction database

  10. The Lattice-Based Approach - 2 Maximal freq. itemsets MinSup = 50% • Construct the “powerset” Lattice P(I)

  11. The Lattice-Based Approach - 3 • Compute support of iternsets via tid-list intersections

  12. Hybrid Search for Freq. Itemsets - 1 • Hybrid Search used to quickly enumerate all frequent itemsets • Hybrid Search combines both the top-down and bot­tom-up search strategies and is based on the intuition that the greater the support of a frequent itemset, the more likely it is to be a part of a longer frequent itemset • The hybrid approach is divided in two main steps: • Initial phase containing the atoms rearrangement, and • The hybrid process itself for generating all frequent itemsets. In the second step, the recursion process is repeated until no more frequent itemset can be generated

  13. Hybrid Search for Freq. Itemsets - 2 • The first step simply rearranges the atoms in descending order of their supports. The sorting algorithm is involved in this step • The second step starts by intersecting a pair of atoms one at a time • The inter­section process is started from a pair of atoms each of which having the largest support among others to produce a larger and longer frequent itemset. • The process stops when an extension becomes infrequent (i.e., item­set that does not satisfy the minimum support requirement). • The second bottom-up phase is then entered

  14. Hybrid Search for Freq. Itemsets - 3 Infrequent Itemsets (MinSup = 50%) Infrequent Itemsets

  15. Design of Application

  16. Test Data Statistics of Test Data

  17. Experimental Results - 1 Number of k-itemsets

  18. Experimental Results - 2 Number of Association Rules

  19. Experimental Results - 3 Computing Time

  20. Experimental Results - 4 Support Counting Performance

  21. Experimental Results - 5 Comparison Results

  22. Conclusions • Experimental results show that the use of this approach as well as the hybrid search method can speed-up the computing time compared to both apriori-based algorithms as well as the similar lattice-based approach that uses the bottom-up search strategy • Another interesting advantage of using the lattice-based algorithm is concerned with time used for scan­ning the databases. In this context, the lattice-based algorithms requires a single database scan once only. Hence, the I/O overhead can be maximally minimized • As far as the computing speed is concerned, it seems that substantial computing time are still required to exe­cute large databases. Although, the lattice-approach is relatively powerful, it indicates that some other computing methodologies, such as the parallel algorithms using the distributed computing environments need to be considered to solve the computing speed problem

More Related