Entropy Estimation and Applications to Decision Trees

Entropy Estimation and Applications to Decision Trees

Estimation Distribution over K=8 classes Repeat 50,000 times: Generate N samples Estimate entropy from samples H=1.289 N=10 N=50000 N=100

Estimation Estimating the true entropy Goals: 1. Consistency: large N guarantees correct result2. Low variance: variation of estimates small 3. Low bias: expected estimate should be correct

Discrete Entropy Estimators

Experimental Results • UCI classificationdata sets • Accuracy on test set • Plugin vs. Grassberger • Better trees Source: [Nowozin, “Improved Information Gain Estimates for Decision Tree Induction”, ICML 2012]

Differential Entropy Estimation • In regression, differential entropy • measures remaining uncertainty about y • is a function of a distribution • Problem • q is not from a parametric family • Solution 1: project onto a parametric family • Solution 2: non-parametric entropy estimation

Solution 1: parametric family • Multivariate Normal distribution • Estimate covariance matrix of all y vectors • Plugin estimate of the entropy • Uniform minimum variance unbiased estimator (UMVUE) [Ahmed, Gokhale, “Entropy expressions and their estimators for multivariate distributions”, IEEE Trans. Inf. Theory, 1989]

Solution 1: parametric family

Solution 2: Non-parametric entropy estimation • Minimal assumptions on distribution • Nearest neighbour estimate • NN distance • Euler-Mascheroni constant • Volume of d-dim. hypersphere • Other estimators: KDE, spanning tree, k-NN, etc. [Kozachenko, Leonenko, “Sample estimate of the entropy of a random vector”, Probl. Peredachi Inf., 1987] [Beirlant, Dudewicz, Győrfi, van der Meulen, “Nonparametric entropy estimation: An overview”, 2001] [Wang, Kulkarni, Verdú, “Universal estimation of information measures for analog sources”, FnT Comm. Inf. Th., 2009]

Solution 2: Non-parametric estimation

Experimental Results [Nowozin, “Improved Information Gain Estimates for Decision Tree Induction”, ICML 2012]

Streaming Decision Trees

Streaming Data • “Infinite data” setting • 10 possible splits and their scores • When to stop and make a decision?

Streaming Decision Trees • Score splits on a subset of samples only • Domingos/Hulten (Hoeffding Trees), 2000: • Compute sample count n for given precision • Streaming decision tree induction • Incorrect confidence intervals, but work well in practice • Jin/Agralwal, 2003: • Tighter confidence interval, asymptotic derivation using delta method • Loh/Nowozin, 2013: • Racing algorithm (bad splits are removed early) • Finite sample confidence intervals for entropy and gini [Domingos, Hulten, “Mining High-Speed Data Streams”, KDD 2000] [Jin, Agralwal, “Efficient Decision Tree Construction on Streaming Data”, KDD 2003] [Loh, Nowozin, “Faster Hoeffding racing: Bernstein races via jackknife estimates”, ALT 2013]

Multivariate Delta Method Theorem. Let be a sequence of -dimensional random vectors such that . Let be once differentiable at with gradient matrix . Then [DasGupta, “Asymptotic Theory of Statistics and Probability”, Springer, 2008]

Delta Method for the Information Gain • 8 classes, 2 choices (left/right) • : probability of choice S, class I [Small, “Expansions and Asymptotics for Statistics”, CRC, 2010] [DasGupta, “Asymptotic Theory of Statistics and Probability”, Springer, 2008] Multivariate delta method: for we have that • , mutual information (infogain) • Derivation lengthy but not difficult, slight generalization of Jin & Agralwal

Delta Method Example As , is fixed

Conclusion on Entropy Estimation • Statistical problem • Large body of literature exists on entropy estimation • Better estimators yield better decision trees • Distribution of estimate relevant in the streaming setting

Entropy Estimation and Applications to Decision Trees

Entropy Estimation and Applications to Decision Trees

Presentation Transcript

Decision Trees

Decision Trees

Decision Trees

Decision Trees

APPLICATIONS OF DECISION TREES Adam Brandenburger

Decision Trees

Decision Trees and Decision Tables

Entropy Estimation and Applications to Decision Trees

Decision Trees

Decision Trees

Decision Trees

Decision Trees

Decision Trees

Decision Trees

Decision Trees

DECISION TREES

Decision Trees

Decision Trees

Decision trees

Decision Trees

Decision Trees