1 / 24

Understanding Selection Biases in Astrophysical Surveys

Learn about resolving sources, selection biases, luminosity functions, and catalog cross-matching in astrophysical surveys using Python tools and techniques. Explore detection methods, biases, and challenges in identifying cosmic objects.

boutwell
Download Presentation

Understanding Selection Biases in Astrophysical Surveys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10 • The catalog of sources • Resolved sources • Selection biases • Luminosity (and mass) functions • Volume- vs flux-limited surveys. • Cross-matching two catalogs.

  2. Python and tut oddments • The module cPickle offers a useful way to save to disk file python-generated data of arbitrary format. • See http://www.python.org/doc/2.5/lib/module-cPickle.html • This can save you having to run a whole MC again just to check the details of a plot! • I see that pyfits is set up to deliver numpy arrays on the NASSP machines. (It only returns Numarray objects on Astronomy computers it seems.) • Assessment: let us take ‘code must run’ to mean ‘it must run on NASSP machines.’ • If I claim you code won’t run, and you think I am wrong, by all means protest!

  3. Detecting resolved sources. • Our earlier assumption that we knew the form of S is no longer true. • Some solutions: • Combine results of several filterings. (Crudely done in XMM.) • But, ‘space’ of possible shapes is large. • Difficult to calculate nett sensitivity. • Wavelet methods.

  4. Wavelet example F Damiani et al (1997) Raw data Wavelet smoothed Multi-scale wavelets can be chosen to return best-fit ellipsoids.

  5. Selection biases • Fundamental aim of most surveys is to obtain measurements of an ‘unbiased sample’ of a type of object. • Selection bias happens when the survey is more sensitive to some classes of source than others. • Eg, intrinsically brighter sources, obviously. • Problem is even greater for resolved sources. • Note: ‘resolved’ does not just mean in spatial terms. Eg XMM or (single-dish HI surveys) in which most sources are unresolved spatially, but well resolved spectrally.

  6. Examples • Optical surveys of galaxies. Easiest detected are: • The brightest (highest apparent magnitude). • Edge-on spirals. • HI (ie, 21 cm radio) surveys of galaxies. Easiest detected are: • Those with most HI mass (excludes ellipticals). • Those which don’t ‘fill the beam’ (ie are unresolved). • Note: where sources are resolved, detection sensitivity tends to depend more on surface brightness than total flux.

  7. Full spatial information • Q: We have a low-flux source - how do we tell whether it is a high-luminosity but distant object, or a low-luminosity nearby one? • A: Various distance measures. • Parallax - only for nearby stars – but Gaia will change that. • Special knowledge which lets us estimate luminosity (eg Herzsprung-Russell diagram). • Redshift => distance via the Hubble relation. This is probably the most widely used method for extragalactic objects.

  8. Luminosity function P Kroupa (1995) • Frequency distribution of luminosity (luminosity = intrinsic brightness). • The faint end is the hardest to determine. • Stars – how many brown dwarfs? • Galaxies – how many dwarfs? • Distribution for most objects has a long faint-end ‘tail’. • Schechter functions. P Schechter (1976)

  9. HI mass function • Red shift is directly measured. • Flux is proportional to mass of neutral hydrogen (HI). • Hence: usual to talk about HI mass function rather than luminosity function. S E Schneider (1996) FYI, HI is pronounced ‘aitch one’.

  10. Relation to logN-logS • Just as flux S is related to luminosity L and distance D by • So is the logN-logS – or, to be more exact, the number density as a function of flux, n(S) - a convolution between the luminosity function n(L) and the true spatial distribution n(D). • BUT… • The luminosity function can change with age – that is, with distance! (And with environment.) SαL/D2

  11. Volume- vs flux-limited surveys • Information about the distance of sources allows one to set a distance cutoff, within which one estimates the survey is reasonably complete (ie, nearly all the available sources are detected). • Such a survey is called volume-limited. It allows the luminosity (or mass) function to be estimated without significant bias. • However, there may be few bright sources. • Allow everything in, and you have a flux-limited survey. • Many more sources => better stats; but biased (Malmquist bias).

  12. Malmquist bias

  13. Malmquist bias Line of constant flux

  14. Malmquist bias Line of constant flux

  15. Catalog cross-matching • It sometimes happens that you have 2 lists of objects, which you want to cross-match. • Maybe the lists are sources observed at different frequencies. • The situation also arises in simulations. • I’ll deal with the simulations situation first, because it is easier. • So: we start with a bunch of simulated sources. Let’s keep it simple and assume they all have the same brightness. • We add noise, then see how many we can find.

  16. Catalog cross-matching • In order to know how well our source-detection machinery is working, we need to match each detection with one of the input sources. • How do we do this? • How do we know the ‘matched’ source is the ‘right one’? CAVEAT: ...I haven’t done a rigorous search of the literature yet – these are just my own ideas.

  17. Catalog cross-matching Black: simulated sources Red: 1 of many detections (with 68% confidence interval). This case seems clear.

  18. Catalog cross-matching But what about these cases? No matches inside confidence interval. Too many matches inside confidence interval.

  19. Catalog cross-matching Or these? Is any a good match? Which is ‘nearest’?

  20. Catalog cross-matching • My conclusion: • The shape of the confidence intervals affects which source is ‘nearest’. • The size of the confidence intervals has nothing to do with the probability that the ‘nearest’ match is non-random. • ‘Nearest neighbour’ turns out to be a slipperier concept than we at first think. To see this, imagine that we have now 1 spatial dimension and 1 flux dimension:

  21. Catalog cross-matching Which is the best match??? Source 5? Or source 8? This makes more sense. Let’s then define r as: 9 8 7 6 5 4 S S 9 8 2 3 6 7 5 1 4 2 3 1 1 1 x x

  22. Catalog cross-matching • As for the probability... well, what is the null hypothesis in this case? • Answer: that the two catalogs have no relation to each other. • So, we want the probability that, with a random distribution of the simulated sources, a source would lie as close or closer to the detected source than rnearest. • This is given by: • where ρ is the expected density of sim sources and V is the volume inside rnearest. Pnull = 1 – exp(-ρV)

  23. Catalog cross-matching • So the procedure for matching to a simulated catalog is: • For each detection, find the input source for which r is smallest. • Calculate the probability of the null hypothesis from Pnull = 1 – exp(-ρV). • Discard those sources for which Pnull is greater than a pre-decided (low) cutoff. • What about the general situation of matching between different catalogs?

  24. Catalog cross-matching ASCA data – M Akiyama et al (2003) Maybe a Bayesian approach would be best? Interesting area of research.

More Related