290 likes | 427 Views
Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?. Fundamentalist Bayes: The posterior is ALL knowledge you have about the state.
E N D
Deciding, Estimating, Computing, Checking How are Bayesian posteriorsused, computed and validated?
Fundamentalist Bayes:The posterior is ALL knowledge you have about the state • Use in decision making: take action maximizing your utility. Must know costto decide state is A when it is B.(Engaging target as Bomber when it is Civilian, as Civilian when it is Bomber, waiting for more) • Estimation: cost of deciding state is ’ when it is
Loss functions • L(x,y)=(x-y)^2, squared error,optimal estimator is mean • L(x,y)=|x-y|, absolute error,optimal estimate is median • L(x,y)=(x-y), Dirac function,optimal estimate is mode
Loss functions HW 2 HW2
Loss functions, + - • Mean: Easy to compute, necessary forestimating probabilities, sensitive to outliers • Median: Robust, scale-invariant, onlyapplicable in 1D • Mode, Maximum A Posteriori, necessary for discrete unordered state space, very non-robust otherwise
Computing Posteriors • Finite state space: easy • Discretized state space: easypost=prior.*likelihood; post=post(sum(post)) • Analytical prior conjugate wrt likelihood: easy • High-dimensional state space (eg 3D image),difficult, MCMC
Conjugate families • Normal prior N(mu,s2) • Normal likelihood N(mu’,s2’) • Then posterior is normal N(mup,s2p), where(x-mu)^2/s2+(x-mu’)^2/s2’=(x-mup)^2/s2p+c • i.e., 1/s2+1/s2’=1/s2p mu/s2+mu’/s2’=mup/s2p • Unknown variances is more difficult …
Conjugate families • Beta conjugate wrt Bernoulli trials • Dirichlet conjugate wrt discrete • Wishart conjugate wrt multivariate normal, • Fairly complete table in Wikipedia
MCMC and mixing Target Small prop Good prop Large prop
Testing and Cournot’s Principle • Standard Bayesian analysis does not reject a model: it selects the best of those considered. • An event with small probability will not happen • Assume a model M for an experiment and a low probability event R in result data • Perform experiment. If R happened, something was wrong: Assumed model M obvious choice • Thus, assumption that M was right is rejected
Test statistic • Define model to test, the null hypothesis H • Define real valued function t(D) on data space. • Find distribution of t(D) induced by H • Define rejection region R such that P(t(D)R) is low (1% or 5%) • R is typically tails of distribution, t(D)<l or t(D)>uwhere [l,u] is a high probability interval • If t(D) in rejection set, the null hypothesis H has been rejected at significance level P(t(D)R) (1% or 5%)
Kolmogorov-Smirnov test Is sample from givendistribution? Test statistic d is maxdeviation of empiricalcumulative distributionfrom theoretical. If d*sqrt(n) > 2.5, Sample is (probably)not from target distr
Kolmogorov-Smirnov test >> rn=randn(10,1); >> jj=[1:10]; >> jj=jj/10; >> KS(sort(rn),rnn) ans= 1.4142 >>
Combining Bayesian and frequentist inference • Posterior for parameter • Generating testing set (Gelman et al, 2003)
Graphical posterior predictivemodel checking takes first place inauthoritative book.Left column is 0-1 coding of logistic regression of sixsubjects response (row) to stimulus(column). Replicationsusing posterior and likelihooddistribution in right six columns. There is clear micro-structure in left column not present in the right ones. Thus,the fitting appears to have beendone with inappropriate(invalid)model.
Cumulative counts of real coal-mining disasters (lower red)Comparing with 100 scenarios of same number of simulateddisasters occuring randomly: The real data cannot reasonablybe produced by a constant-intensity process.
Multiple testing • The probability of rejecting a true null hypothesis at 99% is 1%. • Thus, if you repeat test 100 times, each time with new data, you will reject sometime with probability 0.63 • Bonferroni correction, FWE control:in order to reach significance level 1% in an experiment involving 1000 tests, each test should be checked with significance 1/1000 %
Fiducial Inference R A Fisher (1890--1962). In his paper Inverse Probability, he rejected Bayesian Analysis on grounds of its dependency on priors and scaling. He launched an alternative concept, 'fiducial analysis'. Although this concept was not developed after Fishers time, the standard definition of confidence intervals has a similar flavor. The fiducial argument was apparently the starting point for Dempster in developing evidence theory.
Fiducial inference • Fiducial inference is fairly undeveloped,and also controversial. It is similar in idea to Neyman’s confidence interval which is used a lot despite philosophical problems and lack of general understanding. • Objective is to find region in which a distributions parameters lie, with confidence c. • Region is given by an algorithm: If stated probabilistic assumptions hold, region contains parameters with probability c. • However, this is before data has been seen,and estimator is not sufficient statistic.Somewhat scruffy.
Hedged prediction schemeVovk/Gammerman • Given sequence z1=(x1,y1), z2=(x2,y2), …zn=(xn,yn) AND new x(n+1), predict y(n+1) • xi typically (high-dimensional) feature vector • yi discrete (classification), or real (regression) • Predict y(n+1)Y with (say) 95% confidence, or • Predict y(n+1) precisely and state confidence(classification only) • Predict y(n+1) giving the sequence ‘maximum randomness’ using computable approximation to Kolmogorov randomness • Can be based on SVM method