231 likes | 476 Views
Toward an Intelligent Event Broker: Automated Transient Classification. Przemek Wozniak. Nov 15, 2013. Transient Classification and Follow-up Optimization work at LANL. RAPTOR/Thinking Telescopes network Active machine learning team
E N D
Toward an Intelligent Event Broker:Automated Transient Classification Przemek Wozniak Nov 15, 2013
Transient Classification and Follow-up Optimization work at LANL • RAPTOR/Thinking Telescopes network • Active machine learning team • LANL is a member iPTF consortium, collaborations with JPL and UC Berkeley • LSST Transient Science Working Group
E Event Broker Architecture VOEvent Clients VOAgent Clients Classification Chain Follow-up Chain VOEvent Client Registry VOAgent Client Registry VOEvent Submission VOProfile Submission Alert Distribution Bid Distribution Event DB Event Translation Coalition Manager Event DB Event DB Transient Classification Follow-up optimization External DBs Transient DB Coalition DB Context DB Context Builder
Real/Bogus Classification for iPTF Computer vision approach: feed pixels directly into classifiers REAL BOGUS
Sparse representations withlearned dictionaries L = sparsity factor Iterate C-times over all training data until dictionary convergence Sparse coding for training vector Batch dictionary update – given dictionary and weights
SparseReps algorithm Input data consists of 21 x 21 pix postage stamps from image triples (new, ref, sub): 3 x 441 numbers per candidate Learn 6 dictionaries: R/B x (new, ref, sub) K=900 dictionary elements (2x overcomplete), L=8 sparsity Present the same data 10 times in random order Dictionary imprinting (initialize with a random selection of inputs) For each candidate calculate nearest subspace distance to each dictionary to use as 6 new features for a higher level classifier
Imprinted dictionaries:even learning, normalized data REAL BOGUS REFERENCE NEW IMAGE DIFFERENCE
RB3 features and training Training data based on RB2 (Brink et al. 2012): 77811 candidates (14681 real + 63130 bogus) Features from RB2: ccid, seeingnew, extracted, l1, b_image, flag, mag_ref, mag_ref_err, a_ref, b_ref, flux_ratio, ellipticity_ref, nn_dist_renorm, maglim, normalized_fwhm_ref, good_cand_density, min_distance_to_edge_in_new, gauss, amp Append new SparseReps features: nsd_new_real, nsd_new_bogus, nsd_ref_real, nsd_ref_bogus, nsd_sub_real, nsd_sub_bogus
Real/Bogus version 3.x Two codes: SparseReps+ RB3 Python: NumPy/SciPy + sklearn Use SparseReps to learn new features Append ML features to “expert” features Use random forest for high level classification Running on database at NERSC (no coadds) Cronjob during day time with is_processing_flag veto Requires a four-way SQL table join: candidate, subtraction, image, features Introduces two new tables: sparse_reps_features and realbogus
RB3 deployment Running at NERSC since July 1, 2013 55million candidates processed as of Nov 14 SparseReps 6000 cand/min + RandomForest1e5 cand/min Scores available through a web interface
Classification of Supernova Spectra from SEDMachine • Train and test on simulated data sets based on SNID database (Blondin & Tonry2007) • Randomly draw objects from SN types, interpolate age, degrade resolution to R=100 between 380 and 920 nm, inject noise • Take flux at each wavelength as a feature (100-dimensional feature space). Use Support Vector Machines to classify. • Track accuracy and ability to detect young objects as a function of S/N ratio and number of classes.
Cross-validation accuracy Types Ia, Ib, Ic, II Types Ia, Ib, Ic, IIP, IIL, IIn, IIb
Classifying noisy spectra Cross-validation results Need ~100 spectra per class with 10 < SNR < 100 to train a classifier capable of 90% accuracy Need SNR > 15 per pixel to classify a spectrum with 90% or better accuracy Preprocessing (redshift correction, continuum normalization) will be less reliable at low S/N Bin in age and treat young SNe as a separate class
Confusion matrix: 4 classes N=300, SNR=3, acc=59.6% N=30, SNR=3, acc=64.5% N=300, SNR=50, acc=97.1% N=30, SNR=50, acc=85.5%
Confusion matrix: 7 classes N=30, SNR=3, acc=54.9% N=300, SNR=3, acc=50.1% N=30, SNR=50, acc=87.6% N=300, SNR=50, acc=98.0%
Select young SNe <10 days before the peak N=30, SNR=5, acc=70.1% N=300, SNR=5, acc=68.1% N=30, SNR=50, acc=86.8% N=300, SNR=50, acc=93.9%
Classifying SN Ia subtypes SN1991T: overluminous, no Si II, Ca II early on SN1991bg: very underluminous, no secondary max in I band, Ti II early on, Na I D after max, early nebular phase CSM: circum-stellar medium signature? PEC: peculiar
Summary • First iteration of event broker architecture and prototyping • Computer vision approach to Real/Bogus classification and successful application of sparse representations with learned dictionaries to gain ~2x better performance • RB3.x deployed at NERSC supercomputing center • SVMs can deliver accurate spectral classification of supernovae observed with SED Machine and support selection of young events • Promising results with recognizing SNIa anomalies
Event Broker: distributed processing Data Processing Engine Input From Previous Stage TCP Server Main Processing Loop Incoming Job Queue Outgoing Job Queue Output To Next Stage TCP Client