270 likes | 483 Views
Integration of association rules into WUM. Bastian Germershaus. contents. introduction WUM web usage miner in general association rules – a brief example association rules – the theory association rules in WUM association rules in WUM – a demo. the problem.
E N D
Integration of association rules into WUM Bastian Germershaus
contents • introduction • WUM web usage miner in general • association rules – a brief example • association rules – the theory • association rules in WUM • association rules in WUM – a demo
the problem • we have a large amount of data (e.g. from the server log of our website) • we would like to know if there are any rules in the behavior of our costumers • we could use these rules later on to optimize our business
what Amazon.COM does • looking at their sales data • retrieving all orders by customer and current item • building rules: if a customer bought book X → he also bought books X and/or Y and/or Z
contents • introduction • WUM web usage miner in general • association rules – a brief example • association rules – the theory • association rules in WUM • association rules in WUM – a demo
WUM - web usage miner (1) • main goal: navigation pattern discovery • sequence of pages through the website • typical patterns • optimization of site navigation • three steps • log file cleaning • pattern analysis • visualization
WUM – web usage miner (2) Source: Myra Spiliopoulou: “Web Usage Mining for Web Site Evaluation” in Communications of the ACM, August 2000, Vol. 43
WUM – web usage miner (3) • special requirements • miner should understand abstract pattern descriptions • ‘MINT’ (SQL-like query language) • usage patterns should be more than a sequence of frequently accessed pages • integration of statistics about the routes connecting pages frequently accessed together
WUM – web usage miner (4) Source: Myra Spiliopoulou: “Web Usage Mining for Web Site Evaluation” in Communications of the ACM, August 2000, Vol. 43
WUM – web usage miner (5) • evaluation of discovered patterns is needed • statistical testing • semantic evaluation • discovered navigation patterns may help restructuring the site • redesign pages, inserting links • restructuring may confuse some users
contents • introduction • WUM web usage miner in general • association rules – a brief example • association rules – the theory • association rules in WUM • association rules in WUM – a demo
association rules (1) • example • we sell cell-phones, gadgets and accessories
association rules (2) • possible association rule 200 different orders in database G121 & A111 (30) C212 (13) Support: 0,065 (6,5 %) Confidence: 0,433 (43,3 %) 13 of 30 users, that bought a Compaq Ipaq and a hands-free kit for Nokia phones also bought a Nokia 8110.
association rules (3) • sequence of pages does NOT matter • ‘if – then – condition’ • parameters (support, confidence) • useful rules • apply reasonably often (support) • are unusually reliable (confidence) • make interesting predictions
contents • introduction • WUM web usage miner in general • association rules – a brief example • association rules – the theory • association rules in WUM • association rules in WUM – a demo
association rules in general (1) • “if a customer came to our website through a banner and it is not his first visit then he buys an article” • this object has three attributes: • came through banner • at least second visit • buys an article
association rules in general (2) • binary attributes (0 or 1; yes or no) • rules should have the form if attribute X ►then attribute Y (X→Y) • attributes should be disjunctive (X∩Y=Ø)
association rules in general (3) • parameters for association rules: • confidence “60% where attribute X is true → attribute Y is also true” • support “40% where attribute X is true → attribute Y is also true; that applies to 10% of all cases in the database”
association rules in general (4) • the goal of the used Apriori algorithm is: “find all rules where minimum support and minimum confidence holds true” • two iterative steps • find ‘large item sets’ with minimum support • candidate-generating and pruning
association rules in general (5) • find large item sets • support-calculation for every candidate(support means occurrence of candidate in relation to whole number of objects) • remove every candidate with smaller support then ‘minimum support’ • save candidates with high incidence
association rules in general (6) • candidate-generating and pruning • temporary candidates: for two sets X, Y of cardinality n, which have n-1 attributes in common, build a temporary candidate X U Y • pruning: eliminate all candidates, where support of each candidate with a cardinality of n is lower than min. support
contents • introduction • WUM web usage miner in general • association rules – a brief example • association rules – the theory • association rules in WUM • association rules in WUM – a demo
contents • introduction • WUM web usage miner in general • association rules – a brief example • association rules – the theory • association rules in WUM • association rules in WUM – a demo
contents • introduction • WUM web usage miner in general • association rules – a brief example • association rules – the theory • association rules in WUM • association rules in WUM – a demo