190 likes | 348 Views
Classification of GAIA data. Overview GAIA classification objectives and available data Approaches to classification: principles and problems Example classification using RVS-like data Some specific issues Summary. Coryn A.L. Bailer-Jones Max-Planck-Institut für Astronomie, Heidelberg
E N D
Classification of GAIA data Overview GAIA classification objectives and available data Approaches to classification: principles and problems Example classification using RVS-like data Some specific issues Summary Coryn A.L. Bailer-Jones Max-Planck-Institut für Astronomie, Heidelberg calj@mpia.de
GAIA classification objectives • discrete classification of objects • as star, galaxy, quasar, solar system object, supernovae etc. • determination of astrophysical parameters (APs) for stars • Teff, logg, [Fe/H], [/Fe], CNO, A(), Vrot, Vrad, activity • combination with parallax to determine stellar: luminosity, radius, (mass, age) • identification of unresolved binaries (and parametrization of components where possible) • efficient identification of new types of objects Goal: catalogue of object classifications and astrophysical parameters
GAIA data BBP: 4+ broad band filters all objects MBP: 10-20 medium band filters all objects object classification; stellar Teff, logg, [Fe/H], A() RVS: 849-874 nm spectrum, ~ 0.04 nm/pixel G<17 stellar Vrad, Vrot, specific element abundances Astrometry parallax, kinematics, unresolved binaries Time domain ~50 epochs over 5 years (photometric variability) Inhomogeneous data “Redshift”problem: to get RV, need correct SpT template, but to determine SpT (may) need to know shift use MBP data to give SpT and iterate Generally: use MBP data to give initial classification of RVS data
Classification principles “Supervised” approach: • use pre-classified data (templates) to infer the desired mapping • apply mapping to any new data to give APs or classes But, the desired mapping is generally degenerate...
Minimum Distance Methods (MDMs) • Search for nearest neighbours (templates) in data space • Assign parameters according to these • Generally interpolate: either in data space: = f(d; w) or in parameter space: D = g(; w) • Need to scale data dimensions • e.g. k-nn, 2 min, cross-correlation • a local classification method astrophysical parameter(s) d1,d2 data D distance to a template
Classification principles • selecting just local neighbours in data space can lead to systematic errors or missed solutions • need to find global (forward) mapping and identify degenerate regions • more complex in higher dimensional spaces (data or parameters) • severity of degeneracy depends upon the density of template grid and noise in the data
As with MDM, degeneracy is a problem Artificial Neural Networks (ANNs) • Functional mapping: astrophysical parameters = f(data; weights) • Weights determined by training on pre-classified data (templates) least squares minimization of total classification error (numerical methods) global interpolation of data
Classification example with high-res spectra • Database of 611 real stellar spectra from Cenarro et al. (2001) • variation over Teff, logg, [Fe/H] • coverage: 849 - 874 nm (same as GAIA RVS) • resolution: 0.15 nm @ 0.075 nm/pixel (poorer than GAIA?) • SNR: median=70; 90% in range 20-140 Randomly split data set into two sets: train a neural network on one set and test its performance on the other.
Distribution over APs in Cenarro et al. data blue = training data (300) red = test data (311)
Requirements of the classification scheme • produce both discrete classification and continuous parametrization (e.g. star vs. quasar, APs of stars) • recognition of degeneracies in presence of noise (i.e. recognise multiple classifications for given data vector) • robustly handle missing and censored data • possible RVS lossy compression (as function of magnitude) handle different amounts/formats of data • reliable determination of parametrization uncertainties • accommodate ever-improving stellar models all this for a very wide range of type of objects ...
Hierarchical Parallel Classification schemes P = probability; APs = astrophysical parameters
Model training Real spectra and synthetic spectra not identical: • systematic differences (modelling uncertainties, e.g. opacities) • increased cosmic scatter in real spectra (unaccounted-for APs) 1. Can synthetic spectra be used to reliably parametrize GAIA data? 2. Are performances representative of what can be achieved? 3. Do synthetic spectra give the best optimization of phot/spec systems? 2+3 require accurate synthetic spectra (or large set of real spectra) Can overcome mismatch problem for (1): • use real GAIA data of pre-selected targets to apply corrections to synthetic SEDs • APs of these targets determined from higher resolution spectra from ground-based spectra
Summary • classification with GAIA data is a challenging problem • methods used so far in (astronomical) classification literature are suboptimal for this purpose further development of methods is a high priority • particular problems to overcome are: - degeneracy (especially with MBP data and compressed RVS data) - inhomogeneous data • development of classification methods is very dependent on appropriate data (real or synthetic) - both of targets of interest - and of “contaminating” objects
ICAP: the GAIA classification working group • WG responsible for addressing classification issues for GAIA • 14 core members; 17 associate members GAIA Classification meeting 2-3 December Heidelberg, Germany Anyone interested in classification issues broadly related to GAIA is welcome to attend http://www.mpia.de/GAIA/