Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking How are Bayesian posteriorsused, computed and validated?

Fundamentalist Bayes:The posterior is ALL knowledge you have about the state • Use in decision making: take action maximizing your utility. Must know costto decide state is A when it is B.(Engaging target as Bomber when it is Civilian, as Civilian when it is Bomber, waiting for more) • Estimation: cost of deciding state is ’ when it is 

Maximum expected utility decision

Estimating the state

Loss functions • L(x,y)=(x-y)^2, squared error,optimal estimator is mean • L(x,y)=|x-y|, absolute error,optimal estimate is median • L(x,y)=(x-y), Dirac function,optimal estimate is mode

Loss functions HW 2 HW2

Loss functions, + - • Mean: Easy to compute, necessary forestimating probabilities, sensitive to outliers • Median: Robust, scale-invariant, onlyapplicable in 1D • Mode, Maximum A Posteriori, necessary for discrete unordered state space, very non-robust otherwise

Computing Posteriors • Finite state space: easy • Discretized state space: easypost=prior.*likelihood; post=post(sum(post)) • Analytical prior conjugate wrt likelihood: easy • High-dimensional state space (eg 3D image),difficult, MCMC

Conjugate families • Normal prior N(mu,s2) • Normal likelihood N(mu’,s2’) • Then posterior is normal N(mup,s2p), where(x-mu)^2/s2+(x-mu’)^2/s2’=(x-mup)^2/s2p+c • i.e., 1/s2+1/s2’=1/s2p mu/s2+mu’/s2’=mup/s2p • Unknown variances is more difficult …

Conjugate families • Beta conjugate wrt Bernoulli trials • Dirichlet conjugate wrt discrete • Wishart conjugate wrt multivariate normal, • Fairly complete table in Wikipedia

Wikipedia on conjugate distributions

Markov Chain Monte Carlo

MCMC and mixing Target  Small prop Good prop Large prop

Testing and Cournot’s Principle • Standard Bayesian analysis does not reject a model: it selects the best of those considered. • An event with small probability will not happen • Assume a model M for an experiment and a low probability event R in result data • Perform experiment. If R happened, something was wrong: Assumed model M obvious choice • Thus, assumption that M was right is rejected

Test statistic • Define model to test, the null hypothesis H • Define real valued function t(D) on data space. • Find distribution of t(D) induced by H • Define rejection region R such that P(t(D)R) is low (1% or 5%) • R is typically tails of distribution, t(D)<l or t(D)>uwhere [l,u] is a high probability interval • If t(D) in rejection set, the null hypothesis H has been rejected at significance level P(t(D)R) (1% or 5%)

Kolmogorov-Smirnov test Is sample from givendistribution? Test statistic d is maxdeviation of empiricalcumulative distributionfrom theoretical. If d*sqrt(n) > 2.5, Sample is (probably)not from target distr

Kolmogorov-Smirnov test >> rn=randn(10,1); >> jj=[1:10]; >> jj=jj/10; >> KS(sort(rn),rnn) ans= 1.4142 >>

Kolmogorov-Smirnov test

Combining Bayesian and frequentist inference • Posterior for parameter • Generating testing set (Gelman et al, 2003)

Graphical posterior predictivemodel checking takes first place inauthoritative book.Left column is 0-1 coding of logistic regression of sixsubjects response (row) to stimulus(column). Replicationsusing posterior and likelihooddistribution in right six columns. There is clear micro-structure in left column not present in the right ones. Thus,the fitting appears to have beendone with inappropriate(invalid)model.

Cumulative counts of real coal-mining disasters (lower red)Comparing with 100 scenarios of same number of simulateddisasters occuring randomly: The real data cannot reasonablybe produced by a constant-intensity process.

The useful concept of p-value

Multiple testing • The probability of rejecting a true null hypothesis at 99% is 1%. • Thus, if you repeat test 100 times, each time with new data, you will reject sometime with probability 0.63 • Bonferroni correction, FWE control:in order to reach significance level 1% in an experiment involving 1000 tests, each test should be checked with significance 1/1000 %

Fiducial Inference R A Fisher (1890--1962). In his paper Inverse Probability, he rejected Bayesian Analysis on grounds of its dependency on priors and scaling. He launched an alternative concept, 'fiducial analysis'. Although this concept was not developed after Fishers time, the standard definition of confidence intervals has a similar flavor. The fiducial argument was apparently the starting point for Dempster in developing evidence theory.

Fiducial inference • Fiducial inference is fairly undeveloped,and also controversial. It is similar in idea to Neyman’s confidence interval which is used a lot despite philosophical problems and lack of general understanding. • Objective is to find region in which a distributions parameters lie, with confidence c. • Region is given by an algorithm: If stated probabilistic assumptions hold, region contains parameters with probability c. • However, this is before data has been seen,and estimator is not sufficient statistic.Somewhat scruffy.

Hedged prediction schemeVovk/Gammerman • Given sequence z1=(x1,y1), z2=(x2,y2), …zn=(xn,yn) AND new x(n+1), predict y(n+1) • xi typically (high-dimensional) feature vector • yi discrete (classification), or real (regression) • Predict y(n+1)Y with (say) 95% confidence, or • Predict y(n+1) precisely and state confidence(classification only) • Predict y(n+1) giving the sequence ‘maximum randomness’ using computable approximation to Kolmogorov randomness • Can be based on SVM method

Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking

Presentation Transcript

ESTIMATING

Model Checking

Estimating

Estimating

DECIDING WITH CONFIDENCE

Checking

CHECKING ACCOUNTS

Deciding to volunteer

Estimating

Deciding To Lead

Deciding relevancy

Estimating

Deciding Choreography Reliazability

Parallel and Distributed Computing in Model Checking

Estimating

Deciding the priorities

Deciding to Appeal

Estimating

Estimating

Deciding Subsample Approach