160 likes | 273 Views
Toward Knowledge Discovery in Databases Attached to Grids. Peter Brezany Insti tute for Software Science Univers ity of Vienna E-mail : brezany@par.univie.ac.at. Media That Radically Influenced Society. 1850s Telegraph. 1840s Penny Post. 1500s Printing Press. 1930s
E N D
Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software Science University of Vienna E-mail : brezany@par.univie.ac.at
Media That Radically Influenced Society 1850s Telegraph 1840s Penny Post 1500s Printing Press 1930s Radio 1950s TV 1920s Telephone 20xx Grid 1990s Web
Data Mining on the Grid – Background Information Application Examples Architecture of a Traditional Data Mining System GridMiner – A framework for Data Mining on the Grid GridMiner Architecture Functional and Data Access Model Conclusions Talk Outline
Data mining on the Grid (DMG) : finding unknown data patterns in an environment with geographically distributed data and computation. Data may be highly heterogeneous with a high update frequency A good DMG algorithm analyzes data in a distributed fashion with modest data communication overhead. A typical DMG algorithm involves local data analysis followed by the generation of a global data model. Data Mining on the Grid
Finding out the dependency of the emergence of hepatitis-C on the weather patterns: access to a large hepatitis-C DB at one location and an environmental DB at another location. 2 major financial organizations want to cooperate. They need to share data patterns relevant to the data mining task, they do not want to share the data since it is sensitive - combining the databases may not be feasible. Federating Brain Data Project – Integrating several neuro-science DBs A major multi-national corporation wants to analyze the customer transaction records for quickly developing successful business strategies. - It has thousands of establishments through out theworld - Collecting all the data to a centralized data warehouse,followed by analysis using existing commercial data mining software,takes too long. Application Examples
Telemedical ApplicationsAMG – Austrian Medical Grid Database Raw Medical Data Derived Medical Data Database Reconstructed Medical Data Web
Telemedical Collaboration - Example A patient living in a remote village has a heart problem. An EEG is taken by the local doctor and all the patient’s details are stored in the doctor’s PC based telemedical system. MRI and CT scans are taken within different departments of a general hospital and stored in the telemedical DB. A consultant compiles a report and saves it in the DB. If necessary, in a specialized clinic a 3D ultrasound scan is taken and further report compiled. Requiring complicated surgery, an external specialist using Virtual Reality techniques defines how the surgery should be planned. The resulting operation is placed on video for, e.g., education. Data mining support/assistance is needed.
Knowledge base Database Architecture of a Data Mining System Graphical user interface Pattern evaluation Data mining engine Database or data warehouse server Data cleaning, data integration Filtering Data warehouse
GridMiner – A Framework for Data Mining on Grids System Requirements: - Algorithm and data publishing and integration - Compatibility with grid infrastructure and Grid awareness - Openness - Scalability - Security and data privacy Functionality requirements: - Mining different kinds of knowledge in databases - Incremental data mining algorithms - Interactive mining of knowledge at multiple levels of abstraction
GridMiner (Layered) Architecture (Based on the K.F. Jeffery´s idea)
Example: Mining Patterns for Data Classification and Associations use databasedat1, dat2 mine classifications analyzecredit_rating usingg_parsimony display astree use databaseDBs attributes mine associations usingmethod attributes display asrules
Knowledge Grid Architecture Layers High level layer Data Access Service Tools and Algorithms Access Service Execution Plan Management Result Present. Service Core layer Knowledge Directory Service Resource Allocation Execution Management Generic Grid and Data Grid Services
Grid data mining is a relevant research topic GridMiner approach may contribute to this research domain Collaborations are needed IPG (Information Power Grid) is the only Grid project, which wants to addresss knowledge discovery issues Looking for a pilot application(s) Open issues- basic Grid technology: Globus, DataGrid, Jini, JXTA ? Conclusions
Data Storage and the Components Site D Site C Site A Site B Preprocessing Preprocesing Preprocessing Preprocessing Local DM Local DM Local DM Local DM Construction of the Global Model GUI Site E