220 likes | 298 Views
Making the best use of predictive microbiology (PM) data and models in food safety risk assessment David Vose Director Vose Software www.vosesoftware.com david@vosesoftware.com. What we do…. Government and academia. Industry. Spoiler Risk management applied for a rambling talker.
E N D
Making the best use of predictive microbiology (PM) data and models in food safety risk assessment David VoseDirectorVose Softwarewww.vosesoftware.comdavid@vosesoftware.com
Government and academia Industry
SpoilerRisk management applied for a rambling talker • Risk assessment models are too complicated • Clients ask for them • Risk assessors say they can do them • Risk assessors aren’t trained programmers, don’t have the debugging tools • Risk assessments don’t deliver • Too many assumptions • Too few data • Too much uncertainty • Results carry too many caveats • Simpler, more focused analyses often possible • Lab-based PM data (egComBase) good enough already in risk assessment context • Compared with all the other uncertainties • A focus change in PM could help answer, or even outright answer, many risk questions • Mechanical removal • Location and importance of pathogens in carcasses • Help rank pathogen concentrations in food in terms of risk
What is food safety risk assessment? • The analytical component of food safety risk management • Attempt to quantify the risk and uncertainty in a food safety-related problem • Give managers a better understanding of the impact of the different decision options they have available • Quantification of risk (e.g. there is a 1% chance of X occurring) is potentially much more useful than saying “the risk of X is very low” • Based on mathematical models • A simplified representation of how the system is assumed to behave both now and after any interventions under consideration • Simplified implies that our probability values are approximate • Assumed implies that the numbers generated would only be true if the assumptions all turned out to be correct • The more numerous and tentative the assumptions are, the less useful the numerical results will be • Components of uncertainty – which we should try to minimise • Assumptions • Randomness • Imprecise statistical inference from data • Bad data
Designing a risk assessment • Should be a creative process • Figure out the real problem with managers • Find the quantitative information that managers will be able to use • Do we need a model? Will a simple data analysis suffice? • Should be pragmatic • Focused on answering the most important questions • Based on available data • Adheres to constraints • Achievable within a budget, timeframe • Understandable, adaptable, auditable • Should be believable – the hardest part • Often so many assumptions • Loads of maths few people can understand • Experts tend to be defensive, stick to what they know/believe • Models deal with national level issues, whilst data almost never have the same coverage
Broiler house Example of Farm-to-Fork model Campylobacter in poultry Transport Slaughter house Hanging Scalding Defeathering Evisceration Washing From a PM viewpoint, a much ‘simpler’ problem than usual since there are no growth or reservoir considerations outside the host animal for Campylobacter But we still have a lot of variation to consider: Between farms Between slaughter plants Between CP strains Between food products and their preparation Between consumer handling Between consumer vulnerabilities Chilling Export Chicken parts Whole chickens Frozen Chilled Import Further processing Catering Retail Cross-contamination Heat treatment Consumer Cross-contamination Heat treatment Dose response Risk estimation From: Draft report 2001 Institute of Food Safety and Toxicology Division of Microbiological Safety Danish Veterinary and Food Administration
But the problem is more complicated … Campylobacter in poultry “[P]reparation and consumption of broiler meat may account for 20% to 30% of human cases of campylobacteriosis, while 50% to 80% may be attributed to the chicken reservoir as a whole.” EFSA Source of exposure? Could be: Poultry Cattle (meat, milk) Sheep (meat, milk) Goats (meat, milk) Pigs Ducks Wild birds Dogs, Cats (from meat?) And their faeces in: Lakes Streams Vegetables Mud Fertilizer And in some countries: Poultry litter fed to cattle How many people get ill? “the true number of cases of illness is likely to be 10-100 times higher than the reported number” EFSA In summary: A lot of uncertainty about the cause and pathway, and even more about how many people get ill. Makes it difficult to calibrate the model.
Actually, it’s even more complicated … Campylobacter in poultry Mostly a black box Consumer behaviour Dose response From a limited data set, young adults, in water, 90% confident #CP to give 50% probability of illness is [1,~50000] Risk estimation With all this uncertainty, are fancy models justified? I think we have to look at another approach
Big model exampleSalmonella in pigs Modelled: • Farm-to-consumption of pigs • Accounts for variability between and within Member States Very large model: • Three groups involved, experienced risk analysts • 100,000 lines of code in Matlab • 150 parameters for each Member State + generic parameters • An estimated 900-1000 parameters in total Checking: • “[E]very effort was made in order to minimise the risk of … errors occurring and a long process of review was carried out” • Reached model version 27 • “The validation of the intervention analysis is particularly difficult as there are no validation data with which to compare the model results. In addition, with such a complex and nonlinear model, it is only really possible to assess whether the resulting trend is reasonable, rather than the absolute reduction” • i.e. they had no way to check the numbers that came out
Big model exampleSalmonella in pigs Struggling with data: • Didn’t use the EFSA baseline survey data as required (not possible with simulation anyway) • Used data from other countries • Large farm/small farm management from one MS • Usedexpert estimates to fill in gaps • Usedother bacteria for increase in bacteria during polishing • Usedchicken data for transfer during belly opening • Small slaughterhouse parameters estimated from one Dutch slaughterhouse • Don’t have representative machinery data for slaughter plant so “variability and uncertainty … is expected to be much larger” • Meat production selection (cuts, minced, fermented) not representative • No sensitivity analysis for the dose-response model • Data on transport between farms and to slaughter are scarce • Need data on attachment/removal of bacteria to/from surfaces • Assumes Salmonella acts like E.coli in the scalding stage • Used D-value (10 fold reduction time) from chicken • Used transfer steel-surfaces to sponges and roasted chicken as surrogate for pig to knife • Assumes even distribution of bacteria all over carcass • Time and temperature from retail to home missing • Assumedsame human susceptibility for all MSs • Dose-response data not representative for young, old, pregnant, immunocompromised, and data from much higher doses than modelled • Ignored trade between MSs • … Conclusion: • “There are data gaps and critical assumptions of the model, and these should be considered when interpreting the results of the model. “ • How?
Big model exampleSalmonella in pigs Response to quantitative questions:
Big model exampleSalmonella in pigs The error: • For MS #4, consumer travel time was modelled in hours not minutes (60x too big) • Salmonella case estimated as 29,901, corrected after to 2,686 • Unfortunately, first adopted report used MS #4 as representative MS Conclusion: • “The Scientific Opinion (EFSA, 2010b) focused on the intervention analysis. Therefore the conclusions of the Scientific Opinion are unaffected by this error.” • “although the quantitative conclusions of the intervention analysis do change the qualitative conclusions regarding the effect of interventions do not change, as the relative reductions are similar to those presented in the original report” • So did we need the model? Typical coding error rates: • “Mistakes are probably inevitable in a model of this complexity” • They report 0.01 errors/kLOC (thousand lines of code) which is very, very low • Microsoft: 0.5 /kLOC on release • Industry average: 10 / kLOC • Vose Software: 1.2 / kLOC • Clean room: 0.1 / kLOC • Space shuttle: 0 in 500 kLOC (so they were close to NASA)
Why big models tend to fail More errors Simulation models are stochastic: • We can’t easily check the numbers being produced Big models have more variables: • Which means greater data needs, so scratch around for data, less chance of being kept up-to-date • More assumptions, so hard to know how realistic the model is • Simpler models may seem less ‘realistic’, but at least we know it Few people are competent to provide an external check: • Internal checks have a very poor success rate Better to start differently: • What can we say without a model, or a very simple one • How complete are the data • What are the uncertainties
What do we usually (not) know? • We have some idea of pathogen prevalence • Maybe at the farm • Usually at the slaughter plant (pre-processing) • Some idea of load • Some samples of skin, occasionally an organ • Maybe enumerated for individuals, maybe for pooled samples • Maybe whole carcass rinses • Often just presence/absence • Almost always at the slaughter plant • Maybe some idea of strain • But it’s quite rare to have enumeration by strain, just presence/absence • Often some idea of the dose-response relationship • But not very statistically accurate • In summary • Focus on simpler models • Get better information from the data we habitually collect
Consider this problemChicken neck skin samples • Procedure Ukmeat.org (based on (EC) No 2073/200) • Collect samples from carcasses after they have been chilled for at least 1.5 hours • Select a bird with a long neck skin for sampling (green arrows) • Grab the neck skin through the bag (photo) and cut at least 10g (photo) • Collect 2 more samples in the same way to make 3 in total inside the bag • “A bag containing 3 skins and a combined weight of more than 30g (roughly 1 oz) is classed as a single sample.” • Salmonella test results are reported as either positively detected or absent It’s a HACCP plan, doesn’t give us much load information for food safety risk assessment. Things a risk analyst would love to know • How many cfus on the carcass • Where are they located • Does the location affect survivability and probability of exposing • What are the attenuation rates for different process by location on the carcass
Consider this problemRed meat carcass samples • Procedure Ukmeat.org ( (EC) No 2073/200) • A sponge sample must be taken and tested for Salmonella. The sponge should have an area of at least 50cm2. The width of the sponge should be no larger than 10cm. • Wet the sponge (photo), massage inside bag, grasp sponge through bag (photo) • Swab carcass post inspection, prior to chilling, following pattern (photos A: cattle; B: sheep; C: pig) • Weekly, 5 carcasses / session / species • Salmonella test results are reported as either positively detected or absent. Same problem: HACCP based, little load information Some research says you get 20% of the load acquired with incision.
Broiler house Moment-based modellingA work in progress … Transport Slaughter house Hanging • Lets us anchor to the data where we have, e.g. • Prevalence at farm • Load and prevalence at chiller • Estimated people getting sick • Then we use PM data to fill in the gaps • Change in prevalence • Change in log load Scalding Defeathering Evisceration Washing Chilling Chicken parts Whole chickens Frozen Chilled Catering Retail Cross-contamination Heat treatment Consumer Cross-contamination Heat treatment Dose response Risk estimation
Moment-based modellingA work in progress … • Collected data tend to be at the slaughter plant • It’s a communal point, regulated, can be consistent • But a lot has happened before this stage that could be controlled • Farm (fly nets, biosecurity, feed, etc), transport, cross-contamination during slaughter, mechanical and chemical removal • Log load change data are often not Normally distributed • So shape is important (e.g. skewness, kurtosis) • This makes it impossible to ‘back-calculate’ loads at previous stages in the process using Monte Carlo • Which means we have trouble estimating the effects of interventions • Possible solution is moment-based estimates • Probability maths let’s us estimate moments (mean, variance, skewness, kurtosis) even if we cannot know the distributional form • How PM can help • For log load changes, provide at least the first three the moments (AVERAGE, VAR, SKEW, maybe KURT in Excel) for your raw data – or, better still, make the raw data available • For prevalence changes, provide s/n before and after
Source attribution modelDeveloped from Hald et al Hald, T., Vose, D., Wegener, H.C., Koupeev, T., 2004. A Bayesian approach to quantify the contribution of animal-food sources to human salmonellosis. Risk Anal.24, 255-269. Tries to determine which food source causes infections • Matches data on prevalence in food types by serovar • With data on human illness rates by serovar • Good for Salmonella, not Campylobacter (insufficient typing ability) • 240 lines of code • i = serovar indexj = food type indexk = consuming country indexa = producing countryMjka be the amount of a particular food type j that is consumed in country k but originates from country αpjai is the prevalence of infection/contamination of serovar i in food type i coming from country a • aj relates to the general way the food type is handled (stored, cook) and can be country-specific • qi relates to the serovar. A relative global measure of the serovar’s ability to survive, grow and cause infection. It would be great to be able to pin these down better, e.g. looking at relative rates of growth and toxin production averaged over the naturally occurring range of conditions found in the food products.
Contact detailsDavid Vosedavid@vosesoftware.comTel: +32 932 406 23Iepenstraat 98, Gent 9000, BelgiumThis presentationwww.vosesoftware.com/icpmf7.php