210 likes | 298 Views
Progetto S.Co.P.E. – WP4. The VO-Neural Team G. Longo (Principal Investigator) M. Brescia (Project Manager) S. Cavuoti (applications) A. Corazza (models and algorithms) R. D’Abrusco (applications) G. d’Angelo (documentation, GRID)
E N D
Progetto S.Co.P.E. – WP4 The VO-Neural TeamG. Longo (Principal Investigator) M. Brescia (Project Manager) S. Cavuoti (applications) A. Corazza (models and algorithms) R. D’Abrusco (applications) G. d’Angelo (documentation, GRID) N. Deniskina (GRID – VO interface)M. Garofalo (applications) O. Laurino (System, Applications) A. Nocella (UML software engineering)G. Riccio (Applications) S. Pardi External Members C. Donalek (Caltech) G. Djorgovski (Caltech) The Virtual Observatory and the PON-SCOPE
Summary • What is the Virtual Observatory & its international background • Why the V.Obs. is so important for the future of cosmology • Applications already ported under SCOPE • Astronomy has become an immensely data rich field • Detector evolution (plates to digital to mosaics) • Telescope evolution • Space instruments From 1MB/night to 1TB/night Heterogeneous Data + Metadata
The VLT Survey Telescope 2.6 meter0.021”/pxl16 k x 16 k Secondary Data Providers Follow-Up Telescopes and Missions Data Services --------------- Data Mining and Analysis, Target Selection 100 GB/night Results Digital libraries V.O
Users: >>1000 Total data ca. 1 PByte The Virtual Observatory • Data Gathering (e.g., from sensor networks, telescopes…) • Data Farming: • Storage/Archiving • Indexing, Searchability • Data Fusion, Interoperability • Data Mining(or Knowledge Discovery in Databases): • Pattern or correlation search • Clustering analysis, automated classification • Outlier / anomaly searches • Hyperdimensional visualization • Data understanding • Computer aided understanding • KDD • Etc. • New Knowledge Database technologies Key mathematicalissues Ongoingresearch
Data Mining algorithms scale very badly: • Clustering ~ N log N N2, ~ D2 • Correlations ~ N log N N2, ~ Dk (k ≥ 1) • Likelihood, Bayesian ~ Nm(m ≥ 3), ~ Dk (k ≥ 1) Cf. isophotal, petrosian, aperture magnitudesconcentrationindexes, shapeparameters, etc. V.S.T. Band 1 Band 2 The scientificexploitationof a multi band, multiepoch (K epochs) surveyimpliestosearchforpatterns, trends, etc. among N points in a DxKdimensionalparameterspace N >109, D>>100, K>10 Band 3
Tools in the VONeural • Middleware • AstrogridModel(Nocella) • Interface betweenVirtualObservatory and GRID computing(GRID-launcher; Deniskina, D’Angelo) • Models • Multi LayerPerceptron(VONeural_MLP; Donalek, Cavuoti, Skordovski) • SupportVectorMachines(VONeural_SVM; Cavuoti, Russo) • ProbabilisticPrincipalSurfaces(VONeural_PPS; Garofalo) • Tools • SegmentationofAstronomicalimages(VONeural_Ext; Laurino)
Scientific Applications • Data mining in multiparametric spaces (supervised and unsupervised) • Photometric redshifts (MLP, SVM) • Search for candidate quasars and AGN (PPS, NEC) • Galaxy groups and clusters • CMB simulations of cosmic string signatures • In collaboration with Moscow University • Extraction of catalogues from astronomical images • INAF + Caltech • VST pipeline for distant clusters • INAF + Caltech
Application 1 –VONeural _MLP photometric redshifts • Phot z are an alternative way, less accurate than spectroscopic but much more convenient in terms of computing power and observing time, to derive redshifts (i.e. distances) of extragalactic objects
Phot Z for SDSS General Galaxy sampleat least 30 experiments (10-12 h/each)training on 350.000 objects 12 featuresresults for 32.000.000 objects SDSS-DR4/5 – GG training validation Test set 60%, 20%, 20% MLP, 1(5), 1(18) 0.01<Z<0.25 0.25<Z<0.50 99.6 % accuracy MLP, 1(5), 1(23) MLP, 1(5), 1(24) Interpolationof systematic errors Interpolationof systematic errors s rob = 0.234 s rob = 0.206
Photometric redshifts for 30 million SDSS galaxies σz= 0.02 Redshifts for 30 million galaxies
Two types of compact groups • Spatial clustering in phot_z space: two types of groups: • Compact and isolated • Loose and non embebbed into larger structures • 95% of SKG has large fraction of E-type galaxies f150(E) ≥ 0.5.
Looking for AGN candidates Differentorientations Differentparametersbecomesignificant Differentclusters in parameterspace BUT, STILL THE SAME OBJECT !
Dimensionality reduction (classification of correlated non linear data) 3-D PCA PPS
Negative entropyclustering NEC: a matter of Gaussians Clustering method based on the “neg-entropy” NegE, a measure of non gaussianity of a variable. If A is gaussian, then NegE(A) = 0. Given a threshold d: If NegE(A U B) < d,then clusters A and B are replaced by cluster A U B Not replaced! Replaced!
UKIDSS SDSS PPS preprocessing BoK NEC clustering dendrogram labeling results Cluster optimization 1 experiment ca. 11 days
0 | 1 | 2 | 3 | 4 |5| 6 PPS: We select clusters associating latent variables on the sphere and sources NEC: The number of clusters after the aggregation is determined by “cluster optimization”. SpecClass Leads to proper binning of parameter space
Applicazione 2 con SVM Miglior Risultato: 81.5% PON-SCOPE GRID Infrastructure (110 nodes PON NA-CA-CT) lg2(gamma) lg2(C)
SDSS spectroscopicsubsampleofconfimed QSO (specclass=4 & 6) UKIDS HO-QSO’s Colours used for all these experimentswere calculated using adjacent bands: u−g, g−r, r−i, i−z for the optical bands, and Y −J, J −H, H −K for the near infrared ones
Applicazione 2 con MLP • Gli esperimenti sono stati effettuati selezionando soltanto gli oggetti presenti nel catalogo di G. Sorrentino et al. (2006) (z compreso tra 0.05 e 0.095) che venivano indicati come Tipo 1 e Tipo 2. Si sono selezionati solo quelli sicuramente AGN. • Il dataset si componeva di 1570 oggetti: si è indicato con 1 gli oggetti di Tipo 1 e con 0 gli oggetti di Tipo 2. • Il miglior risultato ottenuto è stato: • Efficienza totale e = 99.4% • Efficienza tipo 1 etipo 1 = 98.4% • Efficienza tipo 2 etipo 2 = 100% • Completezza tipo 1: ctipo 1 = 100% • Completezza tipo 2: ctipo 2 = 98.9%
THE END Workshop SCoPE - Stato del progetto e dei Work Packages Sala Azzurra - Complesso universitario Monte Sant’Angelo 21-2-2008