200 likes | 215 Views
Overview. G. Jogesh Babu. Overview of Astrostatistics. A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots in astronomy (starting with Hipparchus in 4th c. BC) Relevance of statistics in astronomy today State of astrostatistics today
E N D
Overview G. Jogesh Babu
Overview of Astrostatistics • A brief description of modern astronomy & astrophysics. • Many statistical concepts have their roots in astronomy (starting with Hipparchus in 4th c. BC) • Relevance of statistics in astronomy today • State of astrostatistics today • Methodological challenges for astrostatistics in 2000s
Descriptive Statistics • Introduction to R programming language, an integrated suite of software facilities for data manipulation, calculation and graphical display. • Descriptive statistics helps in extracting the basic features of data & provide summaries about the sample and the measures. • Commonly used techniques such as, graphical description, tabular description, and summary statistics, are illustrated through R.
Exploratory Data Analysis An approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to: • maximize insight into a data set • uncover underlying structure • extract important variables • detect outliers and anomalies • formulate hypotheses worth testing • develop parsimonious models • provide a basis for further data collection through surveys or experiments
Probability theory • Conditional probability & Bayes theorem (Bayesian analysis) • Expectation, variance, standard deviation (units free estimates) • density of a continuous random variable (as opposed to density defined in physics) • Normal (Gaussian) distribution, Chi-square distribution (not Chi-square statistic) • Probability inequalities and the CLT
Correlation & Regression • Correlation coefficient • Underlying principles of linear and multiple linear regression • Least squares estimation • Ridge regression • Principal components
Linear regression issues in astronomy • Compares different regression lines used in astronomy • Illustrates them with Faber-Jackson relation.
Statistical Inference • While Descriptive Statistics provides tools to describe what the data shows, the statistical inference helps in reaching conclusions that extend beyond the immediate data alone. • Statistical inference helps in making judgments of an observed difference between groups is a dependable one or one that might have happened by chance in a study. • Topics to be covered include: • Point estimation • Confidence intervals for unknown parameters • Principles of testing of hypotheses
Maximum Likelihood Estimation • Likelihood - differs from that of a probability • Probability refers to the occurrence of future events • while a likelihood refers to past events with known outcomes • MLE is used for fitting a mathematical model to data. • Modeling real world data by estimating maximum likelihood offers a way of tuning the free parameters of the model to provide a good fit.
MLE Contd. • Thomas Hettmansperger's lecture includes: • Maximum likelihood method for linear regression, an alternative to least squares method • Cramer-Rao inequality, which sets a lower bound on the error (variance) of an estimator of parameter. It helps in finding the `best' estimator. • Analysis of data from two or more different populations involve mixture models. • The likelihood calculations are difficult, so an iterative device called EM algorithm will be introduced. Computations are illustrated in the Lab
Nonparametric Statistics • These statistical procedures make no assumptions about the probability distributions of the population. • The model structure is not specified a priori but is instead determined from data. • As non-parametric methods make fewer assumptions, their applicability is much wider • Procedures described include: • Sign test • Mann-Whitney two sample test • Kruskal-Wallis test for comparing several samples
Bayesian Inference • As evidence accumulates, the degree of belief in a hypothesis ought to change • Bayesian inference takes prior knowledge into account • The quality of Bayesian analysis depends on how best one can convert the prior information into mathematical prior probability • Tom Loredo describes methods for parameter estimation, model assessment etc • Illustrates with examples from astronomy
Multivariate analysis • Analysis of data on two or more attributes (variables) that may depend on each other • Principle components analysis, to reduce the number of variables • Canonical correlation • Tests of hypotheses • Confidence regions • Multivariate regression • Discriminant analysis (supervised learning). • Computational aspects are covered in the lab
Bootstrap • How to get most out of repeated use of the data. • Bootstrap is similar to Monte Carlo method but the `simulation' is carried out from the data itself. • A very general, mostly non-parametric procedure, and is widely applicable. • Applications to regression, cases where the procedure fails, and where it outperforms traditional procedures will be also discussed
Goodness of Fit • Curve (model) fitting or goodness of fit using bootstrap procedure. • Procedure like Kolmogorov-Smirnov does not work in multidimensional case, or when the parameters of the curve are estimated. • Bootstrap comes to rescue • Some of these procedures are illustrated using R in a lab session on Hypothesis testing and bootstrapping
Model selection, evaluation, and likelihood ratio tests The model selection procedures covered include: • Chi-square test • Rao's score test • Likelihood ratio test • Cross validation
Time Series & Stochastic Processes • Time domain procedures • State space models • Kernel smoothing • Poisson processes • Spectral methods for inference • A brief discussion of Kalman filter • Illustrations with examples from astronomy
Monte Carlo Markov Chain • MCMC methods are a collection of techniques that use pseudo-random (computer simulated) values to estimate solutions to mathematical problems • MCMC for Bayesian inference • Illustration of MCMC for the evaluation of expectations with respect to a distribution • MCMC for estimation of maxima or minima of functions • MCMC procedures are successfully used in the search for extra-solar planets
Spatial Statistics • Spatial point processes • Intensity function • Homogeneous and inhomogeneous Poisson processes • Estimation of Ripley's K function (useful for point pattern analysis).
Cluster Analysis • Data mining techniques • Classifying data into clusters • k-means • Model clustering • Single linkage (friends of friends) • Complete linkage clustering algorithm