390 likes | 419 Views
Comparing between machine learning methods for a remote monitoring system. Ronit Zrahia Final Project Tel-Aviv University. Overview. The remote monitoring system The project database Machine learning methods: Decision of Association Rules Inductive Logic Programming Decision Tree
E N D
Comparing between machine learning methods for a remote monitoring system. Ronit Zrahia Final Project Tel-Aviv University
Overview • The remote monitoring system • The project database • Machine learning methods: • Decision of Association Rules • Inductive Logic Programming • Decision Tree • Applying the methods for project database and comparing the results
Remote Monitoring System - Description • Support Center has ongoing information on customer’s equipment • Support Center can, in some situations, know that customer is going to be in trouble • Support Center initiates a call to the customer • Specialist connects to site from remote and tries to eliminate problem before it has influence
Gateway Product TCP/IP [FTP] AIX/NT AIX/NT/95 Modem TCP/IP [Mail/FTP] Modem Support Server Customer Remote Monitoring System - Description
Remote Monitoring System - Technique • One of the machines on site, the Gateway, is able to initiate a PPP connection to the support server or to ISP • All the Products on site have a TCP/IP connection to the Gateway • Background tasks on each Product collect relevant information • The data collected from all Products is transferred to the Gateway via ftp • The Gateway automatically dials to the support server or ISP, and sends the data to the subsidiary • The received data is then imported to database
Project Database • 12 columns, 300 records • Each record includes failure information of one product at a specific customer site • The columns are: record no., date, IP address, operating system, customer ID, product, release, product ID, category of application, application, severity, type of service contract
Project Goals • Discover valuable information from database • Improve the products marketing and the customer support of the company • Learn different learning methods, and use them for the project database • Compare the different methods, based on the results
The Learning Methods • Discovery of Association Rules • Inductive Logic Programming • Decision Tree
Discovery of Association Rules - Goals • Finding relations between products which are bought by the customers • Impacts on product marketing • Finding relations between failures in a specific product • Impacts on customer support (failures can be predicted and handled before influences)
Discovery of Association Rules - Definition • A technique developed specifically for data mining • Given • A dataset of customer transactions • A transaction is a collection of items • Find • Correlations between items as rules • Example • Supermarket baskets
Determining Interesting Association Rules • Rules have confidence and support • IF x and y THEN z with confidence c • If x and y are in the basket, then so is z in c% of cases • IF x and y THEN z with support s • The rule holds in s% of all transactions
Discovery of Association Rules - Example • Input Parameters: confidence=50%; support=50% • If A then C: c=66.6% s=50% • If C then A: c=100% s=50%
Itemsets are Basis of Algorithm • Rule A => C • s=s(A, C) = 50% • c=s(A, C)/s(A) = 66.6%
Algorithm Outline • Find all large itemsets • Sets of items with at least minimum support • Apriori algorithm • Generate rules from large itemsets • For ABCD and AB in large itemset the rule AB=>CD holds if ratio s(ABCD)/s(AB) is large enough • This ratio is the confidence of the rule
Inductive Logic Programming - Goals • Finding the preferred customers, based on: • The number of products bought by the customer • The failures types (i.e severity level) occurred in the products
Inductive Logic Programming - Definition • Inductive construction of first-order clausal theories from examples and background knowledge • The aim is to discover, from a given set of pre-classified examples, a set of classification rules with high predictive power • Examples: • IF Outlook=Sunny AND Humidity=High THEN PlayTennis=No
Horn clause induction Given: P: ground facts to be entailed (positive examples); N: ground facts not to be entailed (negative examples); B: a set of predicate definitions (background theory); L: the hypothesis language; Finda predicate definition (hypothesis) such that • for every (completeness) • for every (consistency)
Inductive Logic Programming - Example • Learning about the relationships between people in a family circle
Algorithm Outline • A space of candidate solutions and an acceptance criterion characterizing solutions to an ILP problem • The search space is typically structured by means of the dual notions of generalization (induction) and specialization (deduction) • A deductive inference rule maps a conjunction of clauses G onto a conjunction of clauses S such that G is more general than S • An inductive inference rule maps a conjunction of clauses S onto a conjunction of clauses G such that G is more general than S. • Pruning Principle: • When B and H don’t include positive example, then specializations of H can be pruned from the search • When B and H include negative example, then generalizations of H can be pruned from the search
The preferred customers If ( Total_Products_Types( Customer ) > 5 ) and ( All_Severity(Customer) < 3 ) then Preferred_Customer
Decision Trees - Goals • Finding the preferred customers • Finding relations between products which are bought by the customers • Finding relations between failures in a specific product • Compare the Decision Tree results to the previous algorithms results.
Decision Trees - Definition • Decision tree representation: • Each internal node tests an attribute • Each branch corresponds to attribute value • Each leaf node assigns a classification • Occam’s razor: prefer the shortest hypothesis that fits the data • Examples: • Equipment or medical diagnosis • Credit risk analysis
Algorithm outline • A the “best” decision attribute for next node • Assign A as decision attribute for node • For each value of A, create new descendant of node • Sort training examples to leaf nodes • If training examples perfectly classified, Then STOP, Else iterate over new leaf nodes
Information Measure • Entropy measures the impurity of the sample of training examples S : • is the probability of making a particular decision • There are c possible decisions • The entropy is the amount of information needed to identify class of an object in S • Maximized when all are equal • Minimized (0) when all but one is 0 (the remaining is 1)
Information Measure • Estimate the gain in information from a particular partitioning of the dataset • Gain(S, A) = expected reduction in entropy due to sorting on A • The information that is gained by partitioning S is then: • The gain criterion can then be used to select the partition which maximizes information gain
S: [9+,5-] E=0.940 S: [9+,5-] E=0.940 humidity wind high normal weak strong N [6+,1-] E=0.592 P [3+,3-] E=1.00 [3+,4-] E=0.985 [6+,2-] E=0.811 Gain (S, Humidity) = .940 - (7/14).985 - (7/14).592 = .151 Gain (S, Wind) = .940 - (8/14).811 - (6/14)1.0 = .048 Decision Tree - Example (Continue) Which attribute is the best classifier? Gain(S, Outlook) = 0.246 Gain(S, Temperature) = 0.029
{D1, D2, …, D14} [9+,5-] outlook sunny overcast rain {D3,D7,D12,D13} [4+,0-] {D1,D2,D8,D9,D11} [2+,3-] {D4,D5,D6,D10,D14} [3+,2-] ? ? Yes Decision Tree Example – (Continue) Ssunny = {D1,D2,D8,D9,D11} Gain(Ssunny, Humidity) = .970 – (3/5)0.0 – (2/5)0.0 = .970 Gain(Ssunny, Temperature) = .970 – (2/5)0.0 – (2/5)1.0 – (1/5)0.0 = .570 Gain(Ssunny, Wind) = .970 – (2/5)1.0 – (3/5).918 = .019
outlook sunny overcast rain humidity Yes wind high normal strong weak No Yes No Yes Decision Tree Example – (Continue)
Overfitting • The tree may not be generally applicable called overfitting • How can we avoid overfitting? • Stop growing when data split not statistically significant • Grow full tree, then post-prun • The post-pruning approach is more common • How to select “best” tree: • Measure performance over training data • Measure performance over separate validation data set
Reduced-Error Pruning • Split data into training and validation set • Do until further pruning is harmful: • Evaluate impact on validation set of pruning each possible node (plus those below it) • Greedily remove the one that most improves validation set accuracy • Produces smallest version of most accurate sub-tree
MaxSev < 2.5 >= 2.5 NO: 7 YES: 0 NoOfProducts >= 4.5 < 4.5 NO: 3 YES: 8 NO: 0 YES: 3 The Preferred Customer Target attribute is TypeOfServiceContract
Product6 0 1 NO: 0 YES: 15 Product2 0 1 NO: 0 YES: 1 Product9 0 1 NO: 4 YES: 0 NO: 0 YES: 1 Relations Between Products Target attribute is Product3
Application10 0 1 NO: 0 YES: 11 Application8 0 1 NO: 2 YES: 2 Application2 0 1 NO: 1 YES: 0 NO: 5 YES: 1 Relations Between Failures Target attribute is Application5