1 / 21

Progetto S.Co.P.E. – WP4

Progetto S.Co.P.E. – WP4. The VO-Neural Team G. Longo (Principal Investigator) M. Brescia (Project Manager) S. Cavuoti (applications) A. Corazza (models and algorithms) R. D’Abrusco (applications) G. d’Angelo (documentation, GRID)

rafer
Download Presentation

Progetto S.Co.P.E. – WP4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progetto S.Co.P.E. – WP4 The VO-Neural TeamG. Longo (Principal Investigator) M. Brescia (Project Manager) S. Cavuoti (applications) A. Corazza (models and algorithms) R. D’Abrusco (applications) G. d’Angelo (documentation, GRID) N. Deniskina (GRID – VO interface)M. Garofalo (applications) O. Laurino (System, Applications) A. Nocella (UML software engineering)G. Riccio (Applications) S. Pardi External Members C. Donalek (Caltech) G. Djorgovski (Caltech) The Virtual Observatory and the PON-SCOPE

  2. Summary • What is the Virtual Observatory & its international background • Why the V.Obs. is so important for the future of cosmology • Applications already ported under SCOPE • Astronomy has become an immensely data rich field • Detector evolution (plates to digital to mosaics) • Telescope evolution • Space instruments From 1MB/night to 1TB/night Heterogeneous Data + Metadata

  3. The VLT Survey Telescope 2.6 meter0.021”/pxl16 k x 16 k Secondary Data Providers Follow-Up Telescopes and Missions Data Services --------------- Data Mining and Analysis, Target Selection 100 GB/night Results Digital libraries V.O

  4. Users: >>1000 Total data ca. 1 PByte The Virtual Observatory • Data Gathering (e.g., from sensor networks, telescopes…) • Data Farming: • Storage/Archiving • Indexing, Searchability • Data Fusion, Interoperability • Data Mining(or Knowledge Discovery in Databases): • Pattern or correlation search • Clustering analysis, automated classification • Outlier / anomaly searches • Hyperdimensional visualization • Data understanding • Computer aided understanding • KDD • Etc. • New Knowledge Database technologies Key mathematicalissues Ongoingresearch

  5. Data Mining algorithms scale very badly: • Clustering ~ N log N  N2, ~ D2 • Correlations ~ N log N  N2, ~ Dk (k ≥ 1) • Likelihood, Bayesian ~ Nm(m ≥ 3), ~ Dk (k ≥ 1) Cf. isophotal, petrosian, aperture magnitudesconcentrationindexes, shapeparameters, etc. V.S.T. Band 1 Band 2 The scientificexploitationof a multi band, multiepoch (K epochs) surveyimpliestosearchforpatterns, trends, etc. among N points in a DxKdimensionalparameterspace N >109, D>>100, K>10 Band 3

  6. Tools in the VONeural • Middleware • AstrogridModel(Nocella) • Interface betweenVirtualObservatory and GRID computing(GRID-launcher; Deniskina, D’Angelo) • Models • Multi LayerPerceptron(VONeural_MLP; Donalek, Cavuoti, Skordovski) • SupportVectorMachines(VONeural_SVM; Cavuoti, Russo) • ProbabilisticPrincipalSurfaces(VONeural_PPS; Garofalo) • Tools • SegmentationofAstronomicalimages(VONeural_Ext; Laurino)

  7. Scientific Applications • Data mining in multiparametric spaces (supervised and unsupervised) • Photometric redshifts (MLP, SVM) • Search for candidate quasars and AGN (PPS, NEC) • Galaxy groups and clusters • CMB simulations of cosmic string signatures • In collaboration with Moscow University • Extraction of catalogues from astronomical images • INAF + Caltech • VST pipeline for distant clusters • INAF + Caltech

  8. Application 1 –VONeural _MLP photometric redshifts • Phot z are an alternative way, less accurate than spectroscopic but much more convenient in terms of computing power and observing time, to derive redshifts (i.e. distances) of extragalactic objects

  9. Phot Z for SDSS General Galaxy sampleat least 30 experiments (10-12 h/each)training on 350.000 objects 12 featuresresults for 32.000.000 objects SDSS-DR4/5 – GG training validation Test set 60%, 20%, 20% MLP, 1(5), 1(18) 0.01<Z<0.25 0.25<Z<0.50 99.6 % accuracy MLP, 1(5), 1(23) MLP, 1(5), 1(24) Interpolationof systematic errors Interpolationof systematic errors s rob = 0.234 s rob = 0.206

  10. Photometric redshifts for 30 million SDSS galaxies σz= 0.02 Redshifts for 30 million galaxies

  11. Two types of compact groups • Spatial clustering in phot_z space: two types of groups: • Compact and isolated • Loose and non embebbed into larger structures • 95% of SKG has large fraction of E-type galaxies f150(E) ≥ 0.5.

  12. Looking for AGN candidates Differentorientations Differentparametersbecomesignificant Differentclusters in parameterspace BUT, STILL THE SAME OBJECT !

  13. Dimensionality reduction (classification of correlated non linear data) 3-D PCA PPS

  14. Negative entropyclustering

  15. Negative entropyclustering NEC: a matter of Gaussians Clustering method based on the “neg-entropy” NegE, a measure of non gaussianity of a variable. If A is gaussian, then NegE(A) = 0. Given a threshold d: If NegE(A U B) < d,then clusters A and B are replaced by cluster A U B Not replaced! Replaced!

  16. UKIDSS SDSS PPS preprocessing BoK NEC clustering dendrogram labeling results Cluster optimization 1 experiment ca. 11 days

  17. 0 | 1 | 2 | 3 | 4 |5| 6 PPS: We select clusters associating latent variables on the sphere and sources NEC: The number of clusters after the aggregation is determined by “cluster optimization”. SpecClass Leads to proper binning of parameter space

  18. Applicazione 2 con SVM Miglior Risultato: 81.5% PON-SCOPE GRID Infrastructure (110 nodes PON NA-CA-CT) lg2(gamma) lg2(C)

  19. SDSS spectroscopicsubsampleofconfimed QSO (specclass=4 & 6) UKIDS HO-QSO’s Colours used for all these experimentswere calculated using adjacent bands: u−g, g−r, r−i, i−z for the optical bands, and Y −J, J −H, H −K for the near infrared ones

  20. Applicazione 2 con MLP • Gli esperimenti sono stati effettuati selezionando soltanto gli oggetti presenti nel catalogo di G. Sorrentino et al. (2006) (z compreso tra 0.05 e 0.095) che venivano indicati come Tipo 1 e Tipo 2. Si sono selezionati solo quelli sicuramente AGN. • Il dataset si componeva di 1570 oggetti: si è indicato con 1 gli oggetti di Tipo 1 e con 0 gli oggetti di Tipo 2. • Il miglior risultato ottenuto è stato: • Efficienza totale e = 99.4% • Efficienza tipo 1 etipo 1 = 98.4% • Efficienza tipo 2 etipo 2 = 100% • Completezza tipo 1: ctipo 1 = 100% • Completezza tipo 2: ctipo 2 = 98.9%

  21. THE END Workshop SCoPE - Stato del progetto e dei Work Packages Sala Azzurra - Complesso universitario Monte Sant’Angelo 21-2-2008

More Related