1 / 19

Overview

Explore probability theory, statistical inference, regression, multivariate analysis, nonparametric statistics, model selection, and more using the R programming environment. Learn methods such as Bayesian inference and Monte Carlo simulations.

dunncharles
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview G. Jogesh Babu

  2. Probability theory • Probability is all about flip of a coin • Conditional probability & Bayes theorem (Bayesian analysis) • Expectation, variance, standard deviation (units free estimates) • density of a continuous random variable (as opposed to density defined in physics) • Normal (Gaussian) distribution, Chi-square distribution (not Chi-square statistic) • Probability inequalities and the CLT

  3. R Programming environment • Introduction to R programming language • R is an integrated suite of software facilities for data manipulation, calculation and graphical display. • Commonly used techniques such as, graphical description, tabular description, and summary statistics, are illustrated through R.

  4. Exploratory Data Analysis An approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to: • maximize insight into a data set • uncover underlying structure • extract important variables • detect outliers and anomalies • formulate hypotheses worth testing • develop parsimonious models • provide a basis for further data collection through surveys or experiments

  5. Statistical Inference • While Exploratory Data Analysis provides tools to understand what the data shows, the statistical inference helps in reaching conclusions that extend beyond the immediate data alone. • Statistical inference helps in making judgments of an observed difference between groups is a dependable one or one that might have happened by chance in a study. • Topics include: • Point estimation • Confidence intervals for unknown parameters • Principles of testing of hypotheses

  6. Maximum Likelihood Estimation • Likelihood - differs from that of a probability • Probability refers to the occurrence of future events • while a likelihood refers to past events with known outcomes • MLE is used for fitting a mathematical model to data. • Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit.

  7. Regression • Basic Concepts in Regression • Bias-Variance Tradeoff • Linear Regression • Nonparametric Regression • Local Polynomial Regression • Confidence Bands • Splines

  8. Linear regression issues in astronomy • Compares different regression lines used in astronomy • Illustrates them with Faber-Jackson relation. • Measurement Error models are also discussed

  9. Multivariate analysis • Analysis of data on two or more attributes (variables) that may depend on each other • Principle components analysis, to reduce the number of variables • Canonical correlation • Tests of hypotheses • Confidence regions • Multivariate regression • Discriminant analysis (supervised learning). • Computational aspects are covered in the lab

  10. Cluster Analysis • Data mining techniques • Classifying data into clusters • k-means • Model clustering • Single linkage (friends of friends) • Complete linkage clustering algorithm

  11. Nonparametric Statistics • These statistical procedures make no assumptions about the probability distributions of the population. • The model structure is not specified a priori but is instead determined from data. • As non-parametric methods make fewer assumptions, their applicability is much wider • Procedures described include: • Sign test • Mann-Whitney two sample test • Kruskal-Wallis test for comparing several samples • Density Estimation

  12. Bootstrap • How to get most out of repeated use of the data. • Bootstrap is similar to Monte Carlo method but the `simulation' is carried out from the data itself. • A very general, mostly non-parametric procedure, and is widely applicable. • Applications to regression, cases where the procedure fails, and where it outperforms traditional procedures will be also discussed

  13. Model selection • Chi-square test • Wald Test • Rao's score test • Likelihood ratio test • AIC, BIC

  14. Goodness of Fit • Curve (model) fitting or goodness of fit using bootstrap procedure. • Procedure like Kolmogorov-Smirnov does not work in multidimensional case, or when the parameters of the curve are estimated. • Bootstrap comes to rescue • Some of these procedures are illustrated using R in a lab session on Hypothesis testing and bootstrapping

  15. Bayesian Inference • As evidence accumulates, the degree of belief in a hypothesis ought to change • Bayesian inference takes prior knowledge into account • The quality of Bayesian analysis depends on how best one can convert the prior information into mathematical prior probability • Methods for parameter estimation, model assessment etc • Illustrations with examples from astronomy

  16. Monte Carlo Markov Chain • MCMC methods are a collection of techniques that use pseudo-random (computer simulated) values to estimate solutions to mathematical problems • MCMC for Bayesian inference • Illustration of MCMC for the evaluation of expectations with respect to a distribution • MCMC for estimation of maxima or minima of functions • MCMC procedures are successfully used in the search for extra-solar planets

  17. Time Series • Time domain procedures • State space models • Kernel smoothing • Poisson processes • Spectral methods for inference • A brief discussion of Kalman filter • Illustrations with examples from astronomy

  18. Spatial Statistics • Spatial Point Processes • Gaussian Processes (Inference and computational aspects) • Modeling Lattice Data • Homogeneous and inhomogeneous Poisson processes • Estimation of Ripley's K function (useful for point pattern analysis) • Cox Process (doubly stochastic Poisson Process) • Markov Point Processes

  19. Facing Challenge:Complex Theory and Complex Data in Astrostatistics • Complex Theory – Models with “black box” mappings from parameter space • Complex Data – Large quantity of high dimensional data, spectra, images, with significant observational limitations • Testing Cosmological Theories • Type Ia Supernovae Analysis • Role of dimension reduction • Role of nonparametric methods

More Related