80 likes | 232 Views
Implementing Hoeffding Decision Trees in DB2. CS240A, Win 2003 Carlo Zaniolo. Decision Tree Learning Algorithms. Decision Tree Example. Temp. Mild. Cool. Hot. Yes. No. Yes. New Application Challenges. Classical Learning : In-Memory Data, All Data Available at Beginning
E N D
Implementing Hoeffding Decision Trees in DB2 CS240A, Win 2003 Carlo Zaniolo
Decision Tree Learning Algorithms • Decision Tree Example Temp Mild Cool Hot Yes No Yes
New Application Challenges • Classical Learning: In-Memory Data, All Data Available at Beginning • New Scenario: Very Large Data, Streaming in • SOLUTIONS: Incremental Learning
V1 A B C1 C2 V2 X Y C1 C2 Incremental Decision Tree Construction Intuitively, a small number of samples is sufficient to choose the best attributes to test on each node.
Hoeffding Decision Tree • Hoeffding Bound: Given a random variable r, we make n independent observations to estimate its mean and get . Hoeffding bound states that with probability 1 - , the true mean of the variable is at least - , where is : • Hoeffding Tree: • random variable r: the difference between the information gain given by the best attributes and the 2nd best attributes. • observation: training samples falling into the node so far. • goal: the best attribute is chosen with confidence of 1 - . • mechanics: maintaining a distribution table: ( attr, attr_val, class, # samples )
Make the Best Out of DB2 • Input: training and testing data both in DB2 tables. • Output: a decision tree in DB2 table. • DB2 utilities you may consider: • UDF for learning, • Recursive SQL for prediction. Have Fun !
References: • Pedro Domingos, Geoff Hulten, Mining High-Speed Data Streams, ACM SIGKDD 2000. • Don Chamberlin, A Complete Guide to DB2 Universal Database, Morgan Kaufmann, 1998.