200 likes | 361 Views
Bigelow: Plankton Classification. CMPSCI: 570/670 Spring 2006 Marwan (Moe) Mattar www.cs.umass.edu/~mmattar mmattar@cs.umass.edu. meet the folks. Collaboration between, Computer Vision Lab, UMass, Amherst, MA Machine Learning Lab, UMass, Amherst, MA
E N D
Bigelow: Plankton Classification CMPSCI: 570/670 Spring 2006 Marwan (Moe) Mattar www.cs.umass.edu/~mmattar mmattar@cs.umass.edu
meet the folks • Collaboration between, • Computer Vision Lab, UMass, Amherst, MA • Machine Learning Lab, UMass, Amherst, MA • Bigelow Labs for Ocean Sciences, Boothbay Harbor, ME • Coastal Fisheries Institute, LSU, Baton Rouge, LA
overview • Automatic classification of plankton (phyto- and zoo-) collected in-situ • Why is this important? • Understanding of global ecology • Early detection of harmful algal blooms • Bio-terrorism countermeasures
phyto-plankton • What are phyto-plankton? • They are microscopic plants that live in the sea, sometimes called grasses of the sea • Since phytoplankton depend upon certain conditions for growth, they are a good indicator of change in their environment • Consume carbon dioxide and produce oxygen, hence effect average temperature • First link of the food chain for all marine creatures, so their survival is of great importance • Can be imaged using Flow Cytometer And Microscope (FlowCAM) • Data collection
collecting images • At least a 3-4 day process • One day preparing for your trip, packing and travelling to your point of departure • All of the next day is spent out in sea collecting data and then driving your samples back to the lab • At least another day or two is spent hand-labelling a very, very small number of the phyto-plankton images • We would like to relieve marine biologists from the third step. • An active marine biologist has more data than they can hand-label in their lifetime.
data set • 982 training images belonging to 13 classes • Initial set had many more images from a lot more classes
segmentation • Step 1: Perform segmentation
feature extraction • Step 2: Compute features • Simple Shape (9): area, perimeter, compactness, convexity, eigenratio, rectangularity, # of CC, mean area of CC and std of area of CC • Moments-based (12): mean, variance, skewness, kurtosis and entropy of intensity distribution and 7 moment invariants • Texture features?? • N.B. Almost all the features are invariant to scale and rotation. Which ones are not?
classifier • Step 3: Train Support Vector Machine classifier • 10 fold cross validation • Stratified cross validation?? • Polynomial kernel performed the best • 2nd degree polynomial performed better than a linear classifier • 3rd degree polynomial over-fit • Overall best result: 66% using 21 features
issues in real-world problems • Errors in labelling • Noisy images at low resolution • FlowCAM is very efficient and has a wide field of view • Test-time speed • Not a 0-1 loss • Test data are not sampled IID • Null-class classification
zoo-plankton • Larger marine animals • Feed on phyto-plankton • Can be imaged using Video Plankton Recorder (VPR) • Data set contains 1826 images from 14 classes • Full set contained a lot more images from more classes • Images!!
object recognition • Other variants of the problem include: • Object of interest is in a cluttered background • More than one object present in an image, either detect presence or quantity • Look at standard data sets that the vision community uses to evaluate algorithms • MIT Object Database • Caltech-101 • ETH-80 • Coil-100 (old but still useful for some aspects)
Thank You! Questions?