190 likes | 290 Views
Rule-based Cross-matching of Very Large Catalogs. Patrick Ogle and the NED Team IPAC, California Institute of Technology. NASA Extragalactic Database (NED). A fusion of multi-wavelength extragalactic data from journal articles and large catalogs. NED Holdings (October 2014). 2MASS PSC.
E N D
Rule-based Cross-matching of Very Large Catalogs Patrick Ogle and the NED Team IPAC, California Institute of Technology
NASA Extragalactic Database (NED) A fusion of multi-wavelength extragalactic data from journal articles and large catalogs
NED Holdings (October 2014) 2MASS PSC And much more, including classifications, notes, images, spectra…
New Cross-matching Algorithm • Very Large Catalogs (VLCs, >107 sources) • Find candidate matches in NED • Select best match • Rule-based • Statistical analysis • Match data recorded in DB • Reversible and iterable GALEX ASC (NUV) vs. SDSS DR6 (gri, 6’x6’)
Cross-match Inputs • VLC Source and NED Object Positions (RA, Dec, ±) Source-Object Separation (s, ±σ) • Source and Object Types (galaxy, galaxy cluster, star, UV source, etc…) • Background Object Density (measured for each source) • Instrumental Beam Size • Other: redshift, photometry, diameters
NED Pipeline for Very Large Catalogs • Source Loader • Load Very Large Catalog (VLC) source names and positions into NED. • CSearch (PostgreSQL) • Find match candidates withNED near position search • Count background objects • Spatial indexing will speed up search (e.g. Q3C, HTM) • MatchExpert(python) • Select best match from CSearch match candidates • Object associations for no-matches • Record match statistics for each match • Match statistic distributions and integrals • Code migration to DBMS for speed • Object Loader (PostgreSQL) • Create NED cross-IDs • new objects • associations Source Loader CSearch MatchEx Object Loader
MatchEx Logic S<Scut Thresholds Type Match Name Prefix Match P>Pcut Match List from Csearch S1/S2 <0.33 Error Circles Overlap Single Good Match Create NED object and associations No Match NED dup. NED Cross-ID Match
Associations • Where a match is not made to a nearby object, an association record may be created. • Association types: • Source and object position error circles overlap () • Object is within the beam (PSF) of the source () Error Circles Overlap Create Error Overlap Association record No Match S<beam Create In Beam Association record
Application to GALEX ASC Catalog GALEX ASC (NUV) vs. NED NED object GALEX search region Background region • GALEX All-Sky Catalog of ~40 milllion unique NUV sources created by M. Seibert (2012) • Matched against ~180 million NED objects(2013) SDSS DR6 (g,r,i) SDSS DR6 (gri, 6’x6’)
Poisson Match Probability • Search radius: rs= 7.5″ for GALEX • Background radius: rb=46.5″ for GALEX • Density of background NED objects: n = N/(πrb2) • Expected number inside s: <Ns> = N(s/rb)2, s = separation • Poisson probability of x = k objects closer than s: • Ps(x=k) = <Ns>k exp(-<Ns>)/k! • For k=0, simplifies to: Ps(x=0) = exp(-<Ns>) = exp(-N(s/rb)2) • False-match probability: Pf = 1-Ps(0) rb Example: N = 4, s/rb= 0.08 Ps(0) = 0.975 Pf= 0.025 rs s
Optimizing Match Selection • Optimize on 100K subsample in SDSS region • False-positive rate decreases with increasing Poisson cutoff. • False negative rate increases with Poisson cutoff. • Give 10x weight to false positives--it’s worse to make an incorrect match than to miss a match. • Poisson cutoff value of 90% minimizes the combined, weighted error rate.
GALEX ASC Match Results: Totals • 39,570,031 input GALEX ASC UV sources • NED (2013) contained ~180 million distinct objects • 10,595,382 (26.8%) of the ASC sources matched NED objects Cross-IDs • 28,974,649 (73.2%) are not matched new NED objects • 68.2% of GASC sources are in blank NED fields • 5.0% have multiple match candidates Image credit : GALEX NASA/JPL-Caltech/SSC
GALEX ASC Match Results: Background Rejection and False-Negative Rate • Uncorrelated background out to 15 arcsec fit by straight line: dN/ds ~ s • MatchEx is successful at filtering out this background. • False-negative rate fn = 2.4% estimated by comparison to background-subtracted • match candidates (red line). false negatives Separation (arcsec)
GALEX ASC Results: False Positive Rate • The false-positive match rate is estimated by summing the Poisson statistic (1-P) over all matches and dividing by the total number of sources : fp=0.25% 20 15 Number 10 5
GALEX ASC Results: Position Error Distribution • The distribution of normalized separation r=s/σ deviates from a Gaussian. The peak is at 0.9 instead of 1.0, and the tail is stronger. Important Lessons Learned: Do not assume reported catalog position errors are correct. Do not assume position error distributions are Gaussian. A 3.5σ threshold on match separation rejected more candidates than expected. Derivative of a Gaussian Number r=s/σ
Comparison to SDSS Photometry • While no color criteria were used to select matches to GALEX sources, the NUV-g colors of GALEX-SDSS matches were checked: Most matches have -7<NUV-g<7 • GALEX ASC range: 14<NUV<24 • Detection rate falls at NUV>21.7
Results by Object Type • Object Types ordered by candidate match frequency • Most GALEX sources matched to galaxies (G) and stars (*) • QSO, Galactic star (!*), UV excess object (UvES), and WD* matches overrepresented, • as might be expected for a UV-selected catalog. • Matches to RadioS, XrayS, GGroup, and GPair candidates were disallowed.
GALEX Photometry in NED • GALEX ASC photometry added to NED spectral energy distribution of 3C 382 (CGCG 173-014) • Over 145 million GALEX ASC NUVand FUV photometry • records added to NED (2 extraction methods per band)
VLCs in NED, now and future • GALEX ASC: ~40,000,000 UV sources loaded and matched (2013) • GALEX MSC: ~22,000,000 UV sources loaded and matched (2014) • Spitzer Source List: ~42,000,000 MIR sources (2014) • 2MASS PSC: ~471,000,000 NIR sources loaded(2015 finish) • AllWISE: ~748,000,000 MIR sources (2015 start) • SDSS DR10: ~469,000,000 Vis sources (2015 start) • SDSS DR6: ~154,000,000 Vis sources loaded and matched (out of 217M), excluding sources with undesirable flag values (2008) NED aims to quadruple its object holdings in the next year!