How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeld

How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt US Forest Service, Rocky Mountain Research Station, Moscow, ID Western Mensurationists Missoula, MT June 20-22, 2010

Problem (we have 45 class levels, that’s a lot) Solution (we broke the problem into many subsets and formed an ensemble classifier) Results (very good, and we have a measure of extrapolation) Discussion Contents

We desire to predict the biotic community as a function of climate. There are 45 biotic communities of interest. Brown, D.E., F. Reichenbacher, S.E. Franson. 1998. A classification of North American biotic communities. University of Utah Press, Salt Lake City. 141 pp. Problem

In a 2006 effort on a subset of these communities, we had great results using: Breiman, Leo. 2001. Random Forests. Machine Learning 45:5-32. These results were published in: Rehfeldt, G.E., N.L. Crookston, M.V. Warwell and J.S. Evans. 2006. Empirical analyses of plant-climate relationships for the western United States. Int. J. Plant Sci. 167, 1123-1150. Problem

A Random Forest (RF) is a set of classification or regression trees (CART). RF builds many trees, each one minimizes the classification error on a boot-strap sample of training data. 32 class-levels are supported, but when there are over 10, it uses a sampling scheme for each tree. Random Forests

To classify a new observation: RF puts the new observation down each of the trees in the forest Each tree gives a classification, the classification is a vote. The forest chooses the class having the most votes over all the trees. Random Forests -- continued

We have 45 class levels, over the limit in package randomForest 32! We desire to make predictions using future climates. RF might predict nonsense answers for future climatic conditions that are unique with respect to the training data. These are extrapolations we need to detect. Problem -- continued

Training data: ~1.6 million obs, 35 climate variables from the Moscow climate model. We created 100 Random Forests. To create 1 of the forests: Sample 9 of 45 class levels (without replacement) Make a copy of the training data. Recode the biotic community in this copy; keep as is if code is one of the 9 in the sample, otherwise change the observed class to “other”. Solution -- Steps

Fit each of the 100 RFs. To make a prediction: Put the new case down all 100 RFs, providing a vector of 100 predictions for the case. Count the number of predictions by biotic community code, including “other”. This gives a table of codes and counts that has 46 rows (one for each community code plus “other”). Steps -- continued.

Divide the counts for each code by the number of RFs that contained the code. The ensemble classification is the class value corresponding to the maximum of these quotients. Steps -- continued.

Example 1 (contemporary climate):

Example 2 (future climate 1):

Example 3 (future climate 2):

We interpret predictions of other to indicate extrapolation. For this work, extrapolation indicates there is no biotic community in our study area that corresponds to the (new) climate. It is not a perfect indication of extrapolation. Results

Application to Brown’s biotic communities All of North America Prediction of community as a function of climatic metrics Mapped at 0.0083333 arc degrees (~ 1km2) Results

No analog: contemporary

No analog: 2030

No analog: 2090 Canadian Princeton Hadley

The method can be use on larger problems and perhaps with CART-based methods other than Random Forests. One could add samples that are actually other, that is, not any of those of interest. Random Forests remains a very important tool in our tool set. Discussion / Conclusion

How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeld

How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeld

Presentation Transcript

Random Forests

How to solve taxation problem

Network Intrusion Detection Using Random Forests

How to Solve a Problem: Basic Tips

Using Random Forests to explore a complex Metabolomic data set

How to Solve this Problem

How to solve a math problem!

How to solve equalities with exponents using bases

How To Solve L ong Division

Using Data to Problem Solve

Reduce Instrumentation Predictors Using Random Forests

Network Intrusion Detection Using Random Forests

Using Graphs to Problem Solve

Attempt to solve alignment problem with E/p ratio

HOW TO SOLVE A WORD PROBLEM IN MATH

How do you solve a problem??

How to solve a 2-Step word problem using RUCSAC

Learn How to Solve Math Problem

How to solve roku stick problem

How To Solve The Procrastination Problem

K-Class Classification Problem

Classification of forests