330 likes | 345 Views
One-out all-out principle or Bayesian Integration ?. Sakari Kuikka: University of Helsinki Seppo Rekolainen: Finnish Environmental Institute Mikko Mukula: University of Helsinki Jouni Tammi: University of Helsinki Laura Uusitalo: University of Helsinki.
E N D
One-out all-out principle or Bayesian Integration ? Sakari Kuikka: University of Helsinki Seppo Rekolainen: Finnish Environmental Institute Mikko Mukula: University of Helsinki Jouni Tammi: University of Helsinki Laura Uusitalo: University of Helsinki
1 professor, 3 postdoctoral researchers, 6 postgraduate researchers, 2 graduate students 2 locations: Helsinki and Kotka Research interests: Decision analysis of renewable resources Integrating different sources of data and other knowledge: Bayesian analysis Identification and quantification of risks in the use of natural resources Analysis of management of natural resources in the face of risks and uncertainty in the information and control => User of information in an essential role FEM research group at the University of Helsinki University of Helsinki Sakari Kuikka2
Aim of data collection and data analysis to increase the probability of correct decision making Correct? = achieving aim with high probability, or avoiding problem with high probability (like ”points of no return”) University of Helsinki Sakari Kuikka3
Objectives of the talk • To briefly discuss the sources of uncertainty • To briefly represent the Bayes theory • To represent a classification model based on the Bayes rule in classification • To compare the results to “one-out all-out” principle University of Helsinki Sakari Kuikka4
Number of elements and chance for misclassification EU CIS Ecostat Guidance 2003 University of Helsinki Sakari Kuikka5
Risk: e.g. probability to be, or to go, above a critical threshold? • Probabilistic calculus may be needed for a correct decision (dioxin or P load ”of no return”) University of Helsinki Sakari Kuikka6
Risk Risk = probability * loss Two alternative coin games: • 0.5 * 1000 euros and 0.5 * (- 1000 euros) or B) 0.5 * 10 000 euros and 0.5 * (-10 000 euros) I would pay at least 500 – 2 000 euros to get the first game instead of the second. University of Helsinki Sakari Kuikka7
Sources of uncertainty • Variability over: time, space, measurements, uncertainty in model selection • E.g. several visits to the same lake can produce different measurements/assessment values • E.g. a lake can naturally have poor benthos (e.g. due to high fish predation?) => causalities are not allways deterministic University of Helsinki Sakari Kuikka8
Uncertainties So, there are uncertainties: 1) In measurements (mostly this here) 2) In causal relationships of nature It is diffult to separate these in a data analysis! University of Helsinki Sakari Kuikka9
P (a|b) P (b) P (b|a) = ------------------------ P (a) Bayes rule a: data, observations, etc. b: probability of parameter value, or hypothesis Note:all argumentation is based on probability distributions, not on single values! University of Helsinki Sakari Kuikka10
Likelihood • P (measurement | correct value) • E.g. if correct value is 10, we may have: Measurement Probability 12 0.2 10 0.6 8 0.2 So, measurement 12 can be linked to several real values of the lake ! University of Helsinki Sakari Kuikka11
Bayes rule: probabilistic dependencies Real number of fish (B) P (A|B) Observations (data), A Real number of fish P (B|A) Observations University of Helsinki Sakari Kuikka12
Bayesian inference:P(N | data) P(data | N) P(N) University of Helsinki Sakari Kuikka13
Disretization University of Helsinki Sakari Kuikka14
Applying Bayes rule Several uncertain, but supporting information sources increase the total evidence (=decreases uncertainty) In WFD, the probability (posterior) of a certain classification result, obtained after the probabilistic assessment result of first quality element (e.g. fishes), could be used as a prior for the analysis of the next element. And also should = all quality elements have their own role => learning process of science University of Helsinki Sakari Kuikka15
Model structure + submodels (naive nets) under each element ! University of Helsinki Sakari Kuikka16
Sub models: naive Bayesian nets Class Sp_1 Sp_2 Sp_3 Sp_4 Sp_5 Generally speaking, best methodology to classify University of Helsinki Sakari Kuikka17
Data in this analysis:input to naive nets Only one lake type Fish stock data: 80 lakes, gillnet Phytoplankton: 1330 samples Benthos: 71 samples (22 lakes) Macrophytes: 70 surveys (47 lakes) ”Truth” needed to test the method= arbitrary value of phosphorus was selected as a classifier for lake class University of Helsinki Sakari Kuikka18
Analysis of data Classes: OK (high or good = < 30 ug TP/l ) Restore (moderate or less = > 30 ug TP/l ) Probability of correct classification: leaving out one data point at time from parameter estimation, and using biological information of that data point to classify (the phosphorus of) that lake (weka software) Left out Data University of Helsinki Sakari Kuikka19
Model assumptions • ”One out - all out”: total assessment is ”restore”, if one of the components goes to ”restore” • Same model to test how Bayes rule works in classification • Each element was analyzed with a separate, specific model (naive Bayes net). This ”meta-model” uses likelihoods estimated by those (also integrating) submodels University of Helsinki Sakari Kuikka20
Results 1: Likelihoods (probabilities of correct/uncorrect classifications) Estimated by naive submodels for each element Truth Assessm. Fish Macroph. Benthos Phytopl OK OK 0.92 0.93 0.75 0.91 OK Restore0.08 0.07 0.25 0.09 Restore Restore 0.77 0.69 0.65 0.79 RestoreOK 0.23 0.31 0.35 0.21 The results of the last line are problematic! University of Helsinki Sakari Kuikka21
Results 2: one-out all-out Applying one-out, all-out: If lake is restore, P(assesm=resto)=0.99 If lake is OK, P(assesm=resto) = 0.37 ! (or even higher, depending on some details) = Potential for misclassification, i.e. lot of mismanagent! University of Helsinki Sakari Kuikka22
Results 3: Bayes rule/1 Applying Bayes rule for single oservation & naive net assessment (starting from prior = 0.5): • obs: macr=OK; P (lake=OK) = 0.68 • obs: fish=OK; P (lake=OK) = 0.80 Bayes rule for 2 joined observations: obs: macr=OK, fish=OK; P(lake=OK) = 0.89 obs: benth=OK, phyt=OK; P(lake=OK) = 0.87 obs: macr=resto, fish=resto; P (lake=resto) = 0.99 University of Helsinki Sakari Kuikka23
Conclusions I: ”One out all out” • The problem of the ”one out all out principle” is in the relatively high uncertainty between the real state of nature and the assessment result, i.e. in the likelihood functions (especially benthos in this data set) • The more there are uncertain elements, the more likely is ”false alarm” University of Helsinki Sakari Kuikka24
Conclusions II: Bayes model • Bayes rule helps to integrate uncertain evidence from several sources • Assessment result ”restore” is likely to be correct with a Bayesian model • Assessment result ”OK” is more uncertain, as it may mean a ”restore” lake (see likelihood relationships) • Bayesian models are easiers and cheaper way to decrease uncertainty than increased monitoring effort University of Helsinki Sakari Kuikka25
Results 1: Likelihoods (probabilities of correct/uncorrect classifications) Estimated by naive submodels for each element Truth Assessm. Fish Macroph. Benthos Phytopl OK OK 0.92 0.93 0.75 0.91 OK Restore0.08 0.07 0.25 0.09 Restore Restore 0.77 0.69 0.65 0.79 RestoreOK 0.23 0.31 0.35 0.21 The results of the last line are problematic! University of Helsinki Sakari Kuikka26
Conclusions III: Management • There is clearly a need to link management decisions (program of measures) to the classification: they would give a content for the uncertainty in classification (=probability for misallocation of money?) • We suggest that probability of misclassification is a policy issue, not a scientific issue • Classification models may have an impact on interest to collect/improve data? University of Helsinki Sakari Kuikka27
Way forward I: Risk assessment and Risk Management CHL or ”P level of no return” Pressure C B A A = point estimate level B= risk averse attitude in threshold only C= implementation uncertainty included University of Helsinki Sakari Kuikka28
Conclusions IV • Risk assessment and risk management must be separated (ref. to Scientific, Technical and Economic Committee for Fisheries) • Framework directive = should risk attitude be country specific ? On which values of society it must be based on? • Does the number of people per lake have an impact on management conclusions? (public participation = mechanism to bring in values) University of Helsinki Sakari Kuikka29
Conclusions V • Bayesian network methodology is easy: one week education to start with your data • Conceptual part is more difficult, but far more easy than understanding the real information contents of test statistics in ”classical statistics” • Bayesian parameter estimation (in some areas ”the most correct way to do it”) with e.g. Winbugs software is more difficult, but achievable in 6 – 8 months of work • Education !!!! = Marie Curie activities, join with fisheries ? University of Helsinki Sakari Kuikka30
Way forward II: Multiobjective valuation An example of the value-tree Improved lake goals objectives (weights 0 -1) Ecolog.status Recr. inter. Fishing Swimming Boating Kg/ha CPUE Fish Macrop. criteria alternatives Lake 3 Lake 1 Lake 2 Anne-Marie HagmanMika Marttunen SYKE University of Helsinki Sakari Kuikka31
number of 0.9 inhabitants attractiveness 0.8 0.7 attainment 0.6 Ecolog. status 0.5 cottages 0.4 0.3 swimming 0.2 0.1 0 Lake Isojärvi Lake Sahajärvi Lake Hunttijärvi Lake Venunjärvi Lake Sääksjärvi Lake Ahvenlampi Lake Iso-Vuotava The higher is the preference value, the higher preference the lake has on ”action list” Several publications, work related to WFD starting Way forward II: Example of ranking By: Anne-Marie HagmanMika Marttunen SYKE University of Helsinki Sakari Kuikka32