1 / 1

DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS

DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS. Data Avalanche in Astronomy. Cross Matching : Alignment of Astronomy Catalogs. Astronomy Sky Surveys (SDSS , 2MASS) Observes Galaxies, Quasars, Stars Serendipity Objects Raw Data from Telescope is pre-processed

selah
Download Presentation

DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS Data Avalanche in Astronomy Cross Matching : Alignment of Astronomy Catalogs • Astronomy Sky Surveys (SDSS , 2MASS) • Observes Galaxies, Quasars, Stars Serendipity Objects • Raw Data from Telescope is pre-processed • Hundreds of attributes for each object • National Virtual Observatory - Develop an information technology infrastructure for enabling easy access to distributed astronomy catalogs Catalog P Catalog Q The Matched Catalog Distributed PCA Algorithm • Data Matrix: Site A - n X p , Site B – n X q • p + q = m (total number of attributes) • Normalize the data at respective sites without any communication • A central co-ordination site S sends A and B a random number generation seed • A and B generate a l X n random matrix R (elements of the random matrix are i.i.d and chosen from any distribution with mean 0 and variance 1) • A sends RA and B sends RB to S • Compute D = (RA)T (RB) / l • E[D]= E[AT(RTR)B/ l ] = AT E[RTR] B / l ~ AT B (Johnson and Linden Strauss lemma) The Fundamental Plane of Galaxies Mass / Luminosity / Radius Experimental Results Velocity Dispersion Surface Brightness • Objective: Finding correlations in high dimensional spaces • Domain Knowledge: For the class of elliptical galaxies, observe the parameters Surface Brightness, Log (Velocity Dispersion), Log (Radius) • A 2D plane exists in the observed space of parameters called The Fundamental Plane The Distributed Problem Objective: Finding correlations in high dimensional spaces Domain Knowledge: For the class of elliptical galaxies, observe the parameters Surface Brightness, Log (Velocity Dispersion), Log (Radius) A 2D plane exists in the observed space of parameters called The Fundamental Plane 2MASS Mean Surface Brightness ( Kmsb) SDSS Build a Distributed Principal Component Analysis Algorithm Assumptions : 1. Build the cross matched table off-line 2. Compute indices and send to the sites The Virtual Table Work Done by Haimonti Dutta, Chris Giannella, Kirk Borne, Ran Wolff and Hillol Kargupta NSF Grants: IIS-0329143 , IIS-0093353 , IIS-0203958 and NASA Grant NAS2-37143

More Related