1 / 41

Data Mining and Virtual Observatory

Data Mining and Virtual Observatory. Yanxia Zhang National Astronomical Observatories,CAS DEC.2 2004. Outline. Why What How. Astronomy is Facing a Major “ Data Avalanche” :

easter
Download Presentation

Data Mining and Virtual Observatory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining and Virtual Observatory Yanxia Zhang National Astronomical Observatories,CAS DEC.2 2004

  2. Outline • Why • What • How

  3. Astronomy is Facing a Major “Data Avalanche”: Multi-Terabyte Sky Surveys and Archives (Soon: Multi-Petabyte), Billions of Detected Sources, Hundreds of Measured Attributes per Source … Astronomy is Facing a Major Data Avalanche

  4. Necessity Is the Mother of Invention Understanding of Complex Astrophysical Phenomena Requires Complex and Information-Rich Data Sets, and the Tools to Explore them … … This Will Lead to a Change in the nature of the Astronomical Discovery Process … DM … Which Requires A New Research Environment for Astronomy: VO VO

  5. DM: Confluence of Multiple Disciplines Database system, Data warehouse, OLAP statistics ML&AI DM Visualization Information science Other disciplines

  6. What is DM? The search for interesting patterns, in large databases, that were collected for other applications, using machine learning algorithms, high-performance computers and others methods for science and society!

  7. Data Mining: A KDD Process Knowledge • Data mining: the core of knowledge discovery process. Pattern Evaluation Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

  8. Data Mining Increasing potential to support decisions End User Kwonledge Discovery scientist Analyst Data Presentation Visualization Techniques Data Mining Data Analyst Information Discovery Data Exploration OLAP, MDA, Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts DBA Data Sources (Paper, Files, Information Providers, Database Systems, OLTP)

  9. Architecture: Typical Data Mining System Graphical user interface Pattern evaluation Data mining engine Knowledge-base Database or data warehouse server Filtering Data cleaning & data integration Data Warehouse Databases

  10. The ratio of every DM step Decide target Data preparing Data mining Evaluation

  11. DM: On What Kind of Data? • Relational databases • Data warehouses • Transactional databases • Advanced DB systems and information repositories • Object-oriented and object-relational databases • Spatial databases • Time-series data and temporal data • Text databases and multimedia databases • Heterogeneous and legacy databases • WWW

  12. Data Mining Functionality • Concept description • Association • Classification and Prediction • Clustering • Time-series analysis • Other pattern-directed or statistical analysis

  13. Taking a Broader View: The Observable Parameter Space Flux Non-EM … Morphology / Surf.Br. Time Wavelength  Polarization Proper motion What is the coverage? Where are the gaps? Where do we go next? Dec RA Along each axis the measurements are characterized by the position, extent, sampling and resolution. All astronomical measurements span some volume in this parameter space.

  14. How and Where are Discoveries Made? • Conceptual Discoveries:e.g., Relativity, QM, Brane World, Inflation …Theoretical, may be inspired by observations • Phenomenological Discoveries:e.g., Dark Matter, QSOs, GRBs, CMBR, Extrasolar Planets, Obscured Universe … Empirical, inspire theories, can be motivated by them New Technical Capabilities Observational Discoveries Theory IT/VO (VO) Phenomenological Discoveries:  Pushing along some parameter space axis VO useful  Making new connections (e.g., multi-)VO critical! Understanding of complex astrophysical phenomena requires complex, information-rich data (and simulations?)

  15. Exploration of observable parameter spaces and searches for rare or new types of objects

  16. But Sometimes You Find a Surprise…

  17.  Precision Cosmology and LSS Better matching of theory and observations Clustering on a clustered background Clustering with a nontrivial topology LSS Numerical Simulation (VIRGO) DPOSS Clusters (Gal et al.)

  18. Exploration of the Time Domain: Optical Transients DPOSS A Possible Example of an “Orphan Afterglow” (GRB?) discovered in DPOSS: an 18th mag transient associated with a 24.5 mag galaxy. At an estimated z ~ 1, the observed brightness is ~ 100 times that of a SN at the peak. Or, is it something else, new? Keck

  19. Exploration of the Time Domain:Faint, Fast Transients (Tyson et al.)

  20. DPOSS red image IRAS 100 Micron Image Exploring the Low Surface Brightness (Low Contrast) Universe Comparison between HI, Ha, and 100m Diffuse Emission Brunner et al.

  21. Background Enhancement Technique demonstrated on two known M31 dwarf spheroidals (Brunner et al.)

  22. Data Mining in the Image Domain: Can We Discover New Types of Phenomena Using Automated Pattern Recognition? (Every object detection algorithm has its biases and limitations)

  23. An OLAM Architecture Mining query Mining result Layer4 User Interface User GUI API OLAM Engine OLAP Engine Layer3 OLAP/OLAM Data Cube API Layer2 MDDB MDDB Meta Data Database API Filtering&Integration Filtering Layer1 Data Repository Data cleaning Data Warehouse Databases Data integration

  24. Importing data Table Browsing Dimension creation Dimension browsing Cube building Cube browsing View of Warehouses and Hierarchies

  25. Selecting a Data Mining Task • Major data mining functions: • Summary (Characterization) • Association • Classification • Prediction • Clustering • Time-Series Analysis

  26. Mining Characteristic Rules • Characterization: Data generalization/summarization at high abstraction levels. • An example query: Find a characteristic rule for Cities from the database ‘CITYDATA' in relevance to location, capita_income, and the distribution of count% and amount%.

  27. Browsing a Data Cube • Powerful visualization • OLAP capabilities • Interactive manipulation

  28. Visualization of Data Dispersion: Boxplot Analysis

  29. Mining Association Rules ( Table Form )

  30. Association Rule in Plane Form

  31. Association Rule Graph

  32. Mining Classification Rules

  33. Prediction: Numerical Data

  34. Prediction: Categorical Data

  35. DMiner: Architecture Graphic User Interface Characterizer Cluster Analyzer Comparator Associator Classifier Future Modules Future Modules Database and Cube Server Radio DB Infrared DB Optical DB ……. DB

  36. MultiMediaMiner A System Prototype for MultiMedia Data Mining Simon Fraser University WWW Image features Internet Domain Hierarchy Pre-built Concept Hierarchies for colour, texture, format, etc. Keywords Metadata WordNet Pattern discoveries Keyword Hierarchy Data Cubes and Numeric Hierarchies Pre-processing Real-time Interaction

  37. MultiMediaMiner Simon Fraser University

  38. WebLogMiner Architecture • Web log is filtered to generate a relational database • A data cube is generated form database • OLAP is used to drill-down and roll-up in the cube • OLAM is used for mining interesting knowledge Knowledge Database Web log Data Cube Sliced and diced cube 2 Data Cube Creation 1 Data Cleaning 3 OLAP 4 Data Mining

  39. VO: Conceptual Architecture User Discovery tools Analysis tools Gateway Data Archives

  40. Conclusion ◆ Development and application of DM in astronomy; ◆ Automated DM, visulized DM and audio DM; ◆ Integrate VO and DM. • The next golden age of discovery in astronomy come eariler!

  41. Q&A? Thank you !!!

More Related