380 likes | 529 Views
WOW. World of Walkover-weight. “My God, it’s full of cows!” (David Bowman, 2001). Can walkover-weight suggest a cow needs attention?. Join with breeding information …. Position at the outset …. Obstacle: No health information!!!
E N D
WOW World of Walkover-weight “My God, it’s full of cows!” (David Bowman, 2001)
Position at the outset … • Obstacle: No health information!!! • Suggested: Milking order (i.e. where a cow is in the herd/line-up) is hierarchical and affected by health issues • Proposed goal: to predict a drop in milking order using WOW and other facts
Assumptions … deck of cards • Same cows come in for milking each time • Cows are well-behaved (e.g. arrive in a nice queue) • Data is in good shape (e.g. one reading per cow per milking)
Data problems • Multiple entries for cows (e.g. four entries for 22719193 in QBH2005) • Delete duplicate weights (SQL problem?) • Cow skipped and recycled back into order • Use average if more than one value
“zero” problems • Differentiate between a missing cow, a missing weight and a “zero” weight • Ignore missing cows • Cow skipped and recycled back into order • Time-based interpolation • Can be problematic if cow has been missing for a while • Add flag to indicate weight was “guessed”
other issues in data preparation • Change milking date to milk index • Change birthdate to age in months • Change parturition date to days since last calved • Additional derivatives • milking index - cow’s position in milk order • ∆-index – change in index for a cow over various time periods (1, 3 and 7 days) • mu-weight – average weight over varying-length periods (3, 7, 14, 21 and 28 milkings) • ∆-mu-weight – change in index for a cow (1, 3, and 7 days)
Correlation coefficients QBH2006 (dense) • WOW to index == 0.12 • WOW to 14-day mu-weight == 0.93 • Index to 10-day mu-weight == 0.14 • 3-day ∆-order to ∆-weight == 0.045
Predict change in milking order • Use M5P to predict how the milking order will change for a cow at the next milking • Approx. 205,000 QBH2006 samples (with fewer than 5/25 missing attributes) • 2/3 training 1/3 testing
<missing results go here when available> Re-running took too long … but … you’ve all seen it before, where accuracy was 51.89% (discrimination 0.527) and the model tree was hugely ugly (65 nodes, 33 leaves). Also tried predicting cow’s index as decile and as ratio to herdsize.
Where to? …. • Data must still be scrubbed so that milking order makes sense (if milking order is going to be relevant) • Perhaps cow order needs to be described in completely different terms (e.g. cow buddies) • Easy visualization of herds/cows/breeds/dates/trends is needed this segued into another area of the project ..
Can WOW predict onset of illness? • Combine original attributes and derivatives with health judgments • Cows with unknown health are considered healthy • Need equal number of positive and negative instances
Not so much health data • 1613 recorded instances of health • 913 different cows with health info • 2540 cows with milking info • 788 milked cows with health data • 7 broad categories of illness: • Calving disorder • Metabolic disorder • Udder disorder (only one with >50 in herd) • Reproductive disorder • Lameness • Infectious diseases • Other ailments
Data sparseness QBH2006 • 75 instances out of 324,291 have health • 63 udder disorder • 10 metabolic disorder • 2 lameness • Only .002% positives → will never be isolated → must subsample negatives • Random selection of 75 negatives → data sparseness → over-fitting likely
Data sparseness QBH2006 • 36 cows have illness at some time, so just learn those? • 11,966 records for those cows, 76 of which have illness (still <1% positive) • Random selection of 1% as negatives (about 120)
Refinements to approach QBH2006 • Restrict target objective to UDDER DISORDER • Randomly select equal number of negatives from cows who have health problem at some point goal: differentiate between healthy and unhealthy state
Detecting mastitis amidst random normal cows QBH2006 • Restrict learning objective to UDDER DISORDER • Randomly select equal number of negatives from all cows that have been milked (63+,63-)
When is a cow sick? • So far, attempted to predict health label at point of milking, but .. • … when was the health label attached? before, during or after the current milking? • Goal: predict whether cow needs attention at the next milking (i.e. time series)
=== Summary === Correctly Classified Instances 90 70.3125 % Incorrectly Classified Instances 38 29.6875 % Kappa statistic 0.4026 Mean absolute error 0.3446 Root mean squared error 0.4532 Relative absolute error 68.8933 % Root relative squared error 90.5974 % Total Number of Instances 128 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.508 0.108 0.821 0.508 0.627 0.707 UDDER DISORDER 0.892 0.492 0.652 0.892 0.753 0.707 NONE === Confusion Matrix === a b <-- classified as 32 31 | a = UDDER DISORDER 7 58 | b = NONE
Agenda • Replace quantified attributes with simpler (e.g. boolean, nominal) ones • Characterise exceptions • Below average weight for cow/herd/breed/age • Dropped decile/>50 in order • Broad statistical measures • How many std.devs. from mean • z-score (probability of variation) • Choose negative instances more carefully (select fewer interpolates) • Spend more time with people who know cows