20 likes | 131 Views
DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS. Data Avalanche in Astronomy. Cross Matching : Alignment of Astronomy Catalogs. Astronomy Sky Surveys (SDSS , 2MASS) Observes Galaxies, Quasars, Stars Serendipity Objects Raw Data from Telescope is pre-processed
E N D
DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS Data Avalanche in Astronomy Cross Matching : Alignment of Astronomy Catalogs • Astronomy Sky Surveys (SDSS , 2MASS) • Observes Galaxies, Quasars, Stars Serendipity Objects • Raw Data from Telescope is pre-processed • Hundreds of attributes for each object • National Virtual Observatory - Develop an information technology infrastructure for enabling easy access to distributed astronomy catalogs Catalog P Catalog Q The Matched Catalog Distributed PCA Algorithm • Data Matrix: Site A - n X p , Site B – n X q • p + q = m (total number of attributes) • Normalize the data at respective sites without any communication • A central co-ordination site S sends A and B a random number generation seed • A and B generate a l X n random matrix R (elements of the random matrix are i.i.d and chosen from any distribution with mean 0 and variance 1) • A sends RA and B sends RB to S • Compute D = (RA)T (RB) / l • E[D]= E[AT(RTR)B/ l ] = AT E[RTR] B / l ~ AT B (Johnson and Linden Strauss lemma) The Fundamental Plane of Galaxies Mass / Luminosity / Radius Experimental Results Velocity Dispersion Surface Brightness • Objective: Finding correlations in high dimensional spaces • Domain Knowledge: For the class of elliptical galaxies, observe the parameters Surface Brightness, Log (Velocity Dispersion), Log (Radius) • A 2D plane exists in the observed space of parameters called The Fundamental Plane The Distributed Problem Objective: Finding correlations in high dimensional spaces Domain Knowledge: For the class of elliptical galaxies, observe the parameters Surface Brightness, Log (Velocity Dispersion), Log (Radius) A 2D plane exists in the observed space of parameters called The Fundamental Plane 2MASS Mean Surface Brightness ( Kmsb) SDSS Build a Distributed Principal Component Analysis Algorithm Assumptions : 1. Build the cross matched table off-line 2. Compute indices and send to the sites The Virtual Table Work Done by Haimonti Dutta, Chris Giannella, Kirk Borne, Ran Wolff and Hillol Kargupta NSF Grants: IIS-0329143 , IIS-0093353 , IIS-0203958 and NASA Grant NAS2-37143