How to really describe the variable sky

Hotwiring III, Santa Fe How to really describe the variable sky Matthew J. Graham, CACR, Caltech Nov 15, 2013

The Variable Sky: CRTS DR2 Covers 33000 sq. deg. (0 < RA < 360, -75 < Dec < 70). Calibrated photometry for 500 million objects (> 90 billion data points) Depth V = 19 to 21.5 100 – 600 observations in most regions ~3% LSST

What is characterization? A property P characterizes an object X if X is the only thing that has property P (formal) An object X is characterized by a set of properties {Pi} if the majority of the class of X have {Pi} (astronomical) Quantitative - a RR Lyrae star has: skew < 0.25 kurtosis > -0.5 period between 0.25 d and 0.75 d Qualitative - a galaxy has: complex stellar populations dark matter a satellite stellar system

Initial efforts 1542 stars (Richards et al. 2011)

Characterizing phenomenology Test set of 1.5 million light curves (apply to full 500 million) Measuring: Morphology (shape): skew, kurtosis Scale: Median absolute deviation, biweightmidvariance Variability: Stetson, Abbe, von Neumann Timescale: periodicity, coherence, characteristic Trends: Thiel-Sen Autocorrelation: Durbin-Watson Long-term memory: Hurst exponent Nonlinearity: Teraesvirta Chaos: Lyapunov exponent Models: HMM, CAR, Fourier decomposition, wavelets Defines high-dimensional (representative) feature space Noisy, irregularly sampled data can lead to false features Particularly interested in characterizing variable timescales

Characterized distributions Shape Scale Variability Timescale Trends Autocorrelation Long-term memory Nonlinearity Chaos

Relevant features Symbolic regression: Find a function, in symbolic form, that fits a sample of data Evolutionary algorithm explores a metric space constructed from numerical partial derivatives of pairs of variables in data set looking for best match to predicted candidate function Produces a small set of final candidate analytical expressions on accuracy-parsimony Pareto front Recast binary classification: f(xi) is an equation for the discriminating hyperplane which separates the two classes in some high-dimensional feature space {xi} are the set of features that characterize the classes (Graham et al. 2013a)

Periods are important • Many features used to characterize light curves rely on a derived period • Dubath et al. (2011) show a 22% misclassification error rate for non-eclipsing variable stars with an incorrect period • Richards et al. (2011) estimate that periodic feature routines account for 75% of computing time used in feature extraction

Period finding approaches • Least-squares fit to a set of basis functions: • Lomb-Scargle and its variants • Wavelets • Minimize dispersion measure in phase (ti / p – [ti / p]) space: • Means (PDM) • Variance (AOV) • String length • Entropy • Bayesian • Neural networks • …

They disagree… (b) AOV (a) Lomb-Scargle (c) Generalized Lomb-Scargle (d) PDM

Data sets • CRTS light curves for all objects in SIMBAD and VSX with a quoted period – 15522 • ACVS light curves for MACC classification – 50124 • TSC gold standard of RR Lyrae, EBs and Cepheids – 1500 (Graham et al. 2013b)

Classes • Eruptive (4194): • 2239 T Tauri, 827 red supergiants, 441 RS Can Ven • Pulsating (45599): • 10418 semiregulars, 9830 RR Lyrae, 4065 Mira, 1719 Delta Scuti, 1312 Cepheids • Rotating (455): • 345 chemically peculiar, 89 BY Dra • Cataclysmic (386): • 140 S U Ma, 88 U Gem, 57 novalike • Eclipsing (14952): • 14855 eclipsing binaries, 76 AM Her • X-ray (31) • Other (1358)

Algorithms • Lomb-Scargle (LS) • Generalized Lomb-Scargle (GLS) • Binned analysis of variance (AOV) • Multiharmonic analysis of variance (AOVMHW) • Phase dispersion minimization (PDM) • Improved phase dispersion minimization (PDM2) • FastChi (FC) • String length (STR) • Conditional entropy (CE) • Supersmoother (SS)* • Correntropy kernel periodogram (CKP)*

Entropy-based period finding • Entropy (Cincotta 1999): • Conditional entropy: (Graham et al. 2013c)

Results: light curve properties • All methods are dependent on the quality of the light curve and show a decline in period recovery with lower quality light curves as a consequence of: • fewer observations • fainter magnitudes • noisier data and an increase in period recovery with higher object variability • All algorithms are stable with a minimum bin occupancy of ~10 (assuming Δϕ = 0.1) • A bimodal observing strategy consisting of pairs (or more) of short Δts observations per night and normal repeat visits is better • A minimum frequency step of δν = 0.0001 is sufficient • AOVMHW/CE work well at bright magnitudes (containing saturated values)

Results: classes • The algorithms work best with pulsating and eclipsing variable classes • LS/GLS are strongly effected by half-period issue (eclipsing binaries) • In terms of overall performance - accuracy vs. time - CE or AOV/PDM best • No algorithm does better than ~80% recovery (dp/p ~ 10-3)

Issues • Are the quoted periods correct? • All data sets have been inspected and periods confirmed visually • Can a single value characterize temporal behaviour? • At least 30% of RR Lyrae show Blazhko behaviour • Small amplitude cycle-to-cycle modulations of RRabs • Close binaries and long period variables show cyclic period changes over multidecade baselines • Multiperiodicity in semi-regular variables • Alternate approaches – O-C diagram, wavelets, TFRs – do not reduce to an easy feature • Effects of object misclassification • ~12% of MACC W UMas are ACVS RRCs and vice versa • TSC shows similar effect

Slepian wavelet variance A time series can be decomposed by applying a set of wavelet filters The wavelet variance at a given scale τj gives the contribution to the total variance of the time series due to scale τj Slepian wavelets can work with irregular and gappy time series and are optimal approximations to ideal bandpass filters Characteristic scales are indicated by peaks or changes of behaviour

Quasars Stripe 82 10293 objects CRTS 7300 objects

Characteristic timescale Restframe timescale of ~56 days Anti-correlated with abs. magnitude Zu et al. (2013) detected hints of ~1-3 months in 50 OGLE light curves 4 Kepler AGN indicate breakdown of CAR(1) at smaller timescales CRTS Stripe 82

Summary Exploring set of phenomenology characterizers Outliers Class dependencies Periods Important for classification and related work (stellar parameters) Dependencies on number of observations, magnitudes, sampling/candence, variability, and class Best methods are only ~80% accurate Faster, more accurate methods are needed Quasars New variability-based selection method comparable to CAR(1) and SF Identifies a restframe characteristic timescale of ~56 days indicating a deviation from CAR(1) model Forthcoming work: Expand analysis to 200000 spectroscopically-confirmed quasars in CRTS Produce a CRTS QSO sample (~1 million variability-selected)

How to really describe the variable sky

How to really describe the variable sky

Presentation Transcript

How to describe an information system

How To Really Scare Microsoft

How to scan the whole sky

How to Really Secure the Internet

Introduction to the Sky

The Sky

Introduction to the Sky

How to describe a process ?

6.02 Describe How to Secure Sponsorships

Describe how mining harms the environment.

DOES YOUR RÉSUMÉ REALLY DESCRIBE YOU?

How to Describe Data

How to find stars in the sky?

How to Really Review Papers

The transient and variable radio sky

How is the sky?

Describe how Americans reacted to the revolt in Cuba.

Using Correlation to Describe Relationships between two Quantitative Variable.

How to register with Sky Bet

How to describe an android?

How Does One Describe the Internet?