Identification of high energy gamma-ray sources and source populations

Identification of high energy gamma-ray sources and source populations in the era of deep all-sky coverage Olaf Reimer Stanford University Diego F. Torres Institut de Ciencies de l‘Espai

Solar Flare AGN Unidentified Source Local Group Galaxy Pulsar EGRET 3rd Catalog: ~270 sources Anticipated LAT 1st Catalog: >9000 sources possible 5s Sources in Plane + 4s Sources outside Plane Four Years Pointed Observations Understanding the challenge The assumptions: 3EG catalog sources individually identifiable As for the remainder, it’s pretty sad: Only AGN identifications at high latitudes Halo vs. Plane population sources LMC by source extension PSR by their characteristic periodicity AGN by correlated MWL activity, spatial coincidence, and figure-of-merit approach THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 2

- GRB940217 (100sec) • - PKS 1622-287 flare • - 3C279 flare • - Vela Pulsar • - Crab Pulsar • - 3EG 2020+40 (SNRgCygni?) • - 3EG 1835+59 • - 3C279 lowest 5sdetection • - 3EG 1911-2000 (AGN) • Mrk 421 • Weakest 5sEGRET source 100 seconds 95 minutes = 1 orbit • We’re confident that LAT will have sufficient sensitivity after one day to detect (5s) the weakest EGRET sources. • Anticipated location accuracy will enable individual MWL identifications. 1 day Understanding the challenge THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 3

Current limit for PSR detections (rather an observationally driven accessibility limit) • vastly different numbers [McLaughlin, Gonthier, Harding, KS Cheng ...]but significantly large for established source populations Dermer ‘06 Excurse into numbers: Understanding the challenge THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 4

Current strategy for source classification: Bottom – Up Concept: the largest class of identified gamma-ray sources is blazars, all of which have radio emission. IF a flat-spectrum radio source with strong, compact emission at 5 GHz or above is found in a gamma-ray source error box, it becomes a blazar candidate. The approach: use radio catalogs to search for flat-spectrum radio sources. If a candidate is found, follow up with other observations to locate other blazar characteristics such as polarization and time variability. The EGRET team used this approach in assigning catalog IDs. Mattox et al. quantified the method based on proximity and radio intensity. Sowards-Emmerd et al. have expanded the number of known blazars with this approach. Traditional identification techniques THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 5

Current strategy for source classification: Top – Down Concept: at some level, gamma-ray sources will have (non-thermal) X-ray counterparts. IF such X-ray counterpart can be “found”, the better X-ray position information allows deep searches at longer wavelengths. The approach: X-ray imaging of a individual gamma-ray source error box, eliminate unlikely X-ray sources based on their X-ray, optical, and radio properties. Look for a non-thermal source with a plausible way to produce gamma rays. The classic example is Geminga. Bignami, Caraveo, Lamb, and Halpern started this search in 1983. The final result appeared in 1992 with the detection of X-ray pulsations from this isolated neutron star. Nowadays we will have adjacent VHE band information for some regions. Traditional identification techniques THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 6

Current strategy for source classification: periodicity/correlated variability Concept: Establish characteristic periodicity or correlated MWL variabiliy IF other MWL facilities able to provide specific and distinguishable input The approach: Uniform LAT coverage will enable blind searches and will have photon/flux history for any location “on disk” – somewhat different concept than coordinated/contemporeneous observation campaigns Traditional identification techniques source stacking/spatial correlation studies Concept: Find common signatures among a members of a class of established or hypothesized gamma-ray emitter IF found, subsequent individual MWL identification techniques required The approach: Governed by physical expectations for gamma-ray emission, a ranking scheme/spatial arrangement will be investigated for ensemble characteristics THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 7

Identifying on a case-by-case basis all LAT sources using multiwavelength techniques with ad hoc simultaneous observations is simply not possible due to the number of sources (similar to X-ray astronomy) • Use of FoM-classifiers (e.g., Mattox et al. 2001 or Soward-Emerds et al. 2003) can (and will) make a relative order of correctness within what we already know exist as population among g-ray sources in EGRET • they will work fine (i.e. providing sound identifications) for the brightest of the sources (excellent agreement on EGRET, for instance) • there will be unavoidable ambiguities for less bright sources, especially for sources along the Galactic plane, with no apparent way of distinguishing between classes • AGNs SED are too varied: there is a lack of reliable templates for AGNs SEDs (and in addition the same source may exhibit large spectral variations with time) • it would be up to the community to decide what to believe in a particular source’ nature, if there are alternatives with similar observational signatures Approaching a limit THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 8

Enlarge the catalogs of AGNs and pulsars Problem 1:when an incomplete catalog is complete enough? (e.g. the ongoing problems in the radio/gamma correlation in blazars.) Problem 2:discovery of new populations would depend on members identification. (implies lack of confidence level for the population as a whole unless extensive multi-years multi-frequency studies are performed for many members.) Problem 3:simultaneity of multi-wavelength studies can be secured for a selected handful of sources. (We cannot use this technique to completely explore a source-filled gamma-ray sky) and correlate these with LAT the detections without a precise knowledge of which AGNs and which pulsars are able to emit gamma-rays, given their respective SEDs (no real veto system) Approaching a limit the number of identifications will only be limited by the number of sources in the counterpart catalog considered. THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 9

Counterparts What’s problematic here? Let’s review some examples: An “average” EGRET source: 3EG J1249-8330 [95 =0.66 ° , 2 x 10-7 ph cm-2 s-1] 1) 4 XMM-EPIC pointing -> 148 X-ray sources 2) statistical evaluation of counterparts 3) does computing a counterpart probability pc = ppos x p(i)SED x p(i)var x p(i)ext x … will yield a source identification here ? No, since for N = 94…148 -> pc will be numerically undistinguishable in the systematics of its computation Approaching a limit La Palumbara et al. THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 10

But source is extended! Counterparts What’s problematic here? LAT will be better psf than EGRET, thus an example from VHE gamma-rays HESS J1303-631 (13h03m00.4s±4.4s and δ=−63°11’55”±31”) at least 5 catalog counterparts listed in several counterpart categories Approaching a limit THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 11

Inverting the problem to strike the eye(now we consider a large number of gamma-ray sources instead of a large number of counterparts) In the last BATSE catalog, if one gives account of the positional error boxes, there was a detection of one or more GRB for every line of sight of any instrument at any wavelength used to compile any list of possible counterparts. Approaching a limit Correlation analysis potential is basically lost. THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 12

How far is LAT from the former? At low Galactic Latitude (no priors) 1000 LAT sources in the Galactic Plane (|b|<10) with 12’ uncertainty = 20% coverage At high Galactic Latitude (no priors) 103 LAT sources out of Galactic Plane (|b|>10) with 12’ uncertainty = 0.3% coverage Understanding the challenge 104 LAT sources out of Galactic Plane (|b|>10) with 30’ uncertainty = 20% coverage We need an scheme that allow us to classify populations of sources, and use it before internal relative scales of the goodness of detected individuals are applied (within already known populations) to make sure that we do not “over-identify” up to the point were discovering new populations is no longer possible. THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 13

An appealing goal An appealing goal for the first year all sky survey should be, in our opinion, to be able to say • which kind of populations have been detected in the GLAST sky, • which is the statistical confidence for the detectionof each of them (systematically quantified using the same technique) • which are the most likely detected individuals of each class, so that multi-frequency obs. can proceed with confidence This classification naturally should extend beyond what’s already known from EGRET (i.e., pulsars and blazars). THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 14

A new paradigm in gamma-ray astronomy We suggest the establishment of an a priori protocol of source population discovery, based on a controlled analysis of positional coincidences. • Three parts are involved: Theoretical censorship:prohibits executing repeated searches that would likely reduce the statistical significance of any possible positive class correlation; Preserving the discovery potential: that protects the significance by which one claims the discovery of a number of important population candidates and that gives guidelines as to how to manage the probability budget; Common statistical assessment criteria:that assigns probabilities both in the large and in the small number statistical regime. Torres & Reimer ‘05, ApJ 629, L141 THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 15

Part 1: Theoretical Censorship • We request as part of the criterion that predictions (ideally of multiwavelength / multi-messenger character) are available for a subset of the proposed counterpart class. • This request is made to avoid the blind testing of populations that may or may not produce gamma-rays, but for which no other than a positional correlation result can be a posteriori achieved. • If there is no convincing theoretical indication that a population can emit gamma-rays before making the search, such population should not be sought. THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 16

Part 2: Discovery protection • If one probe a large number of samples, and make an equally large number of trials with the same instrument detections, one will find positive correlations, at least as a result of statistical fluctuations. • To claim significance, one would have to check if the penalties that must be paid for such a finding (i.e., the fact that there were a number of trials that led to null results) does not overcome the significance achieved. This may turn out to be practically impossible (if there is not an a priori established source selection). • We are in favor of defining the populations that are to be tested, and the testing protocol, before actual data taking. Lessons to learn from ultra high energy cosmic ray physics: few events, large number of claims, many of them plainly wrong (see discussion by Torres, Reimer, et al. ApJ 595, L13, 2003)

Part 2: Simple basis for a protocol Suppose for definiteness that the total budget is a chance probability equal to B, and that we want to test A,B,C,... classes of different sources. The total budget can then be divided into individual chance probabilities, PA, PB, etc., such that the sum of Pi=B. Population i will be claimed as detected if the a posteriori experimental probability for its random correlation, Pexp(i), is less than the a priori assigned Pi(as opposed to be less only than the larger, total budget. Important!: this allows to discover simultaneously different populations) We can then manage the budget of probabilities: For some populations we can less confidently agree that they will be detected, or for some others, the number of their members may be low enough such that a detection of few of its individuals would be needed to claim a great significance. In this situation, we would choose a relatively large Pi, so as to make easier for the test to pass. For others, say AGNs and pulsars, we can assign a relatively small Pi in such a way to make harder for the test [whether the inequality Pexp.(i) < Pi is fulfilled] to pass, and that they take less of the total budget. If one or more of the tests are passed, the results areindividually significantbecause first we protected our search by the a priori establishment of the protocol (it was a blind test) and second, because the overall chance probability is still less than the total budget B.

In the example below we choose to test High Latitude Molecular Clouds and Starbursts with 40% of the high latitude budget each. These are new populations, if discovered, so we want to privilege the chance of spotting them. If B=10-4 then Pexp(HLC) < 0.4 B in order for the population to be claimed as discovered (the significance level of that is discussed below) For others, say classes of AGNs, we can assign a relatively small P(AGN) in such a way to make harder for the test [whether the inequality Pexp.(AGN) < P(AGN) is fulfilled] to pass, and that they take less of the total budget. In this example, P(FSRQ) = 0.1 B. FSRQs is not a new population, so we don’t want to spend our budget on them: it is exactly the same as requiring a very high confidence level for the discovery of this population.

Part 3: Quality evaluation C(A) number of members of candidates that coincide with unidentified detections. N(A) number of known sources in the particular candidate population A under analysis U number of detections. P probability that in a random direction of the sky we find a gamma-ray source. As we have seen earlier, P is not overwhelmingly large(uniform distribution with no priors gives P less than a few percent for less than 10 000 sources). A more careful treatment will reduce the value of P from these simple estimations. Such low values for P make the product P x N(A) typically in the range 1-100, for all different candidate populations. We can refer to this product as noise expectation. Then the excess number of coincidences over the noise is: E(A)=C(A) - P x N(A). THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 20

E(A) = C(A) - P x N(A). Excess = Coincidences - Noise Pulsars and blazars will present the largest number of positional coincidences. Let us assume that there are 2000 catalogued AGN; with P ~10-2 or 10-3, all coincidences in excess than 6-60 are beyond the random expectation. Now, C(AGN) >> P x N(A), and thus the number of excesses would be large: we are in the domain of a large number statistics and a probability for the number of excesses to occur by chance, Pexp(AGN) could be computed. When both terms in in the expression for E(A) are small quantities (small number statistics): we should test the null hypothesis for a new source population against a reduced random noise. Methods such as Feldman & Cousins (1998) or Gehrels (1986) are useful to assess quality in this case and obtain Pexp(A) An example of a null hypothesis is “X-ray binaries are not LAT sources”. We have 0 predicted signal events (coincidences) and P x N(A) background. With N(A) ~ 200 and P~ 3 x 10-3, detecting more than 5 coincidences rules out the null hypothesis at 95% CL. If the budgeted P(X –ray bin.) < Pexp (X –ray bin.) have uncovered a new population of sources with 95% CL.

What has to be considered? • Which are the populations to be tested? how large should the a priori probability be for each of them? how to best compute the random probability P? how large the total budget B should be? all must be answered to completely determine the protocol. • By researching and ultimately establishing a protocol along these lines, the problem of identifying the classes of gamma-ray sources can be looked at in a sound way, with individually high levels of confidence and collective low random probability. • This would immediately open the possibility of centering efforts into case-by-case studies, but knowing that the class has been detected with, say 95% CL. THE MULTI-MESSENGER APPROACH TO UNIDENTIFIED GAMMA-RAY SOURCES, BARCELONA’06 22

Variability certainly helps, but... If there is a previous prediction of a periodic signal of the flux, that alone unambiguously label the source. Ok. But: This will happen for only a very verysmall fraction of detections: absence of completeness in the pulsar timing parameters, and shortage of precise variability predictions for accretion powered X-ray binaries. Even if a theoretically compatible variability timescale appears, if we have not identified the class of sources to which the sought counterpart pertains, that in itself will constitute the reason by which to justify the need of follow-up observational campaigns. In any case, most of the sources will either be steady or show no definitive variability timescale. And worse, for most classes of sources, we theoretically expect no variability.

Sensitivity and completeness of catalogs is not always good Not having complete catalogs of “identified” populations is not something to fear, but the reflection of a discovery opportunity. We know we are already missing one or several new source populations, both at low and at high Galactic latitudes There are strong indications of variable and non-variable, non-periodic, point-like and extended, low latitude sources, as well as of non-variable, high latitude, extended sources, all of which are beyond the expected behavior of pulsars and AGNs. If many (all) sources were to be correlated with AGNs, for instance, only a case by case analysis could show that the classification by position only is wrong. But remember, GLAST will see >1000 sources!

Identification of high energy gamma-ray sources and source populations