110 likes | 350 Views
Distinguish Wild Mushrooms with Decision Tree. Shiqin Yan. Objective. Utilize the already existed database of the mushrooms to build a decision tree to assist the process of determine the whether the mushroom is poisonous . DataSet.
E N D
Objective • Utilize the already existed database of the mushrooms to build a decision tree to assist the process of determine the whether the mushroom is poisonous.
DataSet • Existing record drawn from the Audubon Society Field Guide to North American Mushrooms (1981) . G. H. Lincoff (Pres. ), NewYork: Alfred A. Knopf • Number of Instances: 8124 (classified as either edible or poisonous) • Number of Attributes: 22 • Training: 5416, Tuning: 1354, Testing: 1354 • Missing attribute values: 2480 (denoted by “?”), all for attribute 11
Mushroom Features • 1. cap-shape: bell=b, conical=c, convex=x, flat=f, knobbed=k, sunken = s • 2. cap-surface: fibrous=f, grooves=g, scaly=y, smooth=s • 3. cap-color: brown=n, buff=b, cinnamon=c, gray=g, green=r, pink=p, purple=u, red=e, white=w, yellow=y • 4. bruise?: bruises=t, no=f • 5. odor: almond=a, anise=l, creosote=c, fishy=y, foul=f • …
Approach • Mutual information to determine the features used to split the tree. • Mutual information: • Y: label, X: feature • Choose feature X which maximizes I(Y;X)
Most informative features extracted from decision tree: • odor • spore-print-color • habitat • population
Prior Research by WlodzislawDuch, Department of Computer Methods, Nicholas Copernicus University
Future • Add cross-validation to improve the accuracy • Prune the tree to avoid over-fitting