1 / 10

New Data Mining Techniques for Surveys

Explore the innovative data mining techniques developed by Peter-Christian Zinn at the Astronomical Institute of Ruhr-Universität Bochum, bringing order to chaotic survey data.

jknudtson
Download Presentation

New Data Mining Techniques for Surveys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000100101011101010110101010111010101010010110100101001010011101010010001001010111010101101010101110101010100101101001010010100111010100100010010101110101011010101011101010101001011010010100101001110101001000 RUHR-UNIVERSITÄT BOCHUM Bringing order to chaos New data-mining techniques for new surveys Peter-Christian Zinn Astronomical Institute of Ruhr-University, Bochum, Germany CSIRO Astronomy & Space Science, Sydney, Australia

  2. ∑~70 million Norris et al. (2011) Why new data handling techniques? • New radio surveys will produce lots of data! • ASKAP/EMU ~ 70 million objects • LOFAR/Tier 1 ~ 7 million objects • WSRT/WODAN ~ 10 million objects • New optical/NIR surveys will produce even more data! • Pan-STARRS/PS1-3π~ 5-30 billion objects • LSST/Galaxy “gold sample“~ 10 billion objects • Astronomers go wild! • Tera-, exa-, petabyte scale Mostly no spectra available LSST Science Book

  3. Implications for survey science • There are no spectroscopic redshifts • Redshift information must be accessed on other ways → photometric (better: statistical) redshifts • There are no spectral classifications • Classification of an object must be inferred on other ways → Flux ratios or SED-fitting (better: kNN classification) becomes more important • There are no spectroscopically derived parameters • Classic parameters such as metallicity must be derived on other ways → scaling relations (better: kNN regression) must be utilized

  4. Fan et al. 2001 Common approaches • Define plain color criteria • Model SEDs • Look for morphology, scaling relations, ... • PROs: • Physically motivated • Easy to reproduce in 2d diagrams • High completeness • CONs: • Global model • Does not work for high dimensions • Many false positive candidates

  5. Our approach: k nearest neighbors 0 0 0 4 • use k-Nearest Neighbours • local model • works fine in high dimensions • does not require physical assumptions • good reference samples available ? mean=1 median=0

  6. Example 1: statistical redshifts • stat-z for ATLAS • ATLAS has spec-z for ~30% of allobjects • Training with 12-band data (ugriz,IRAC,MIPS24,13cm,20cm) Comparison: Cardamone et al. (2010) 14-band photo-z: 0.026 PCZ, Polsterer & Gieseke (subm.) • Advantages of statistical redshifts • No assumptions must be made (no template SEDs, luminosity range, dust reddening, flux homogenization, ...) • Computation much faster than for class. photo-z (tstat-z~ n*log2(n) | tphoto-z~ nα , α>2) PCZ, Polsterer & Gieseke (subm.)

  7. Redshift estimation for SDSS quasars • kNN regression model + selected reference set • 77,000 references reduced to 1,100 objects • optimized for z > 4.8 • 4 colors used Laurino et al. (2011) Polsterer, PCZ & Gieseke (2012)

  8. kNN-based classification of ATLAS test-sample yields combined false classification rate of 9% Smolcic et al. (2008) achieve contamination rates between 15% - 20% using a highly sophisticated photometric method Example 2: object classification • SF / AGN separation • Classical tool: BPT-diagram (requires spectroscopy) • Alternative: MIR color-color selection (not very reliable) • SED fitting (work-intensive) PCZ et al. (in prep)

  9. Example 3: metallicity • Metallicity from L-Z relation • Spectroscopic input: SDSS metallicities as derived by Brinchman et al. (2004) • Lr-Z relation calibrated by the 2dF survey (Lamareille et al. 2004) applied to Galactic extinction-corrected fluxes • No other assumptions made • Metallicity from kNN regression • Spectroscopic input: SDSS metallici-ties derived by Brinchman+ (2004) • kNN regression with respect to the 90 nearest neighbors • No other assumptions made PCZ, Polsterer & Gieseke (subm.) PCZ, Polsterer & Gieseke (subm.)

  10. Summary • We presented the first results of utilizing advanced machine-learning techniques to classify/analyze large data sets. • Dealing with large data sets will become increasingly important due to the enormous amounts of data forthcoming (radio) surveys will produce. • A k nearest neighbor-based approach was tested on available data from ATLAS, COSMOS and the SDSS. • Results for redshifts, object classifications and the regressional computation of astrophysical quantities (e.g. metallicity) all yield promising results. • Data-mining will already play an important role in currently upcoming projects, e.g. ASKAP/EMU.

More Related