1 / 25

Designing Services for Grid-based Knowledge Discovery

Designing Services for Grid-based Knowledge Discovery. A. Congiusta, A. Pugliese, Domenico Talia , P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it. Future Generation Grids, Dagstuhl Seminar, November 2004. SUMMARY.

nibal
Download Presentation

Designing Services for Grid-based Knowledge Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it Future Generation Grids, Dagstuhl Seminar, November 2004

  2. SUMMARY • The use of computers is changing our way to make discoveries and is improving both speed and quality of the discovery processes. • In this scenario the Grid can provide an effective computational support for distributed knowledge discovery from large and distributed data sets. To this purpose we designed a system called Knowledge Grid. • This talk discusses how to design distributed knowledge discovery services, according to the OGSA model, by using the Knowledge Grid services starting from searching Grid resources, composing software and data elements, and executing the resulting application on a Grid.

  3. OUTLINE • MOTIVATIONS • TOWARDS KNOWLEDGE SERVICES • THE KNOWLEDGE GRID • OGSA SERVICES FOR KNOWLEDGE DISCOVERY • A META-LEARNING EXAMPLE • CONCLUSIONS

  4. MOTIVATIONS • Lots of data collected and warehoused. • Data collected and stored at enormous speeds in local databases, from remote sources, or from the sky. • Scientific simulations generating terabytes of data. • Huge data sets are hard to understand. • Traditional techniques are infeasible for raw data. • Computational science is evolving toward data-intensive applications that include • data analysis, • information management, and • knowledge discovery.

  5. MOTIVATIONS • Most data will never be examined by humans; it is analyzed and summarized by computers. • Data analysis is becoming a key element in scientific discovery and in business processes. • Data intensive applications are defined to be those that explore, query, analyze, visualize, and in general, process very large-scale data sets. • Data intensive applicationshelp • scientists in hypothesis formation • companies to provide better, customized services and support decision making.

  6. SCIENTIFIC OBJECTIVES TOWARDS KNOWLEDGE SERVICES Grid-aware Knowledge Discovery Systems • This objective can be achieved through • development of techniques and tools for supporting data intensive applications and • integration of Data and Computation Grids with Information and Knowledge Grids. to support the process of unification of data management and knowledge discovery systems with Grid technologies for providing knowledge-based Grid services.

  7. THE KNOWLEDGE GRID PAST • KNOWLEDGE GRID-a distributed knowledge discovery architecture that integrates data mining techniques and computational Grid resources. • In the KNOWLEDGE GRID architecture data mining tools areintegrated with lower-level Grid mechanisms and services and exploit Data Gridservices. • This approach benefits from "standard" Grid services and offers an open architecture that can be configured on top of generic Grid middleware.

  8. KNOWLEDGE GRID ARCHITECTURE PAST Generic and Data Grid Services KNOWLEDGE GRID

  9. THE KNOWLEDGE GRID PAST FUTURE Service Selection

  10. OGSA KNOWLEDGE GRID SERVICES FUTURE • The KNOWLEDGE GRID is an abstract service-based Grid architecture that does not limit the user in developing and using service-based knowledge discovery applications. • We are defining a set of Grid Services that export functionality and operations of the KNOWLEDGE GRID. • Each of the KNOWLEDGE GRID services is exposed as a persistent service, using the OGSA conventions and mechanisms.

  11. KNOWKEDGE SERVICES: A Meta-Learning Example • A simple example of meta-learning process over the KNOWLEDGE GRID. • To show how the execution of a significant distributed data mining application can benefit from the Knowledge Grid services, provided through the OGSA model. • Meta-learning aims to generate a number of independent classifiers by applying learning programs to a collection of distributed data sets in parallel. • The classifiers computed by learning programs are then collected and combined to obtain a global classifier.

  12. KNOWKEDGE SERVICES: A Meta-Learning Example

  13. KNOWKEDGE SERVICES: A Meta-Learning Example • A user application interacts with Knowledge Grid nodes to generate a classifier by combining the classifiers built from different subsets of a given data set. • The scenario comprises five nodes: • NU, running the user application that builds the meta-learning application and visualizes the global classifier; • NS, which is used for resource discovery and for steering the meta-learning application execution; • NA, on which the original dataset is located and it provides a data partitioning service; • NC, providing learning services which are performed in parallel over a homogeneous cluster; • NZ, providing a combiner/tester service used to compute the global classifier.

  14. RESOURCE DISCOVERY AND EXECUTION PLANNING RESOURCE DISCOVERY AND EXECUTION PLANNING TAAS DAS User Application DAS TAAS EPMS DAS DAS TAAS R R R R R Storage Reservation Factory NU NS Resource Reservation Factory Resource Reservation Factory Database Service Partitioner Factory Learner Factory Combiner Factory NC NZ NA The DAS and TAAS services of node Ns invoke the corresponding services on other Knowledge Grid nodes, in order to obtain information about the needed resources. Contacted nodes reply to node Ns sending meta-information. The application builds an execution plan for the meta-learning process, specifying strategies for data movement and algorithm execution. The execution plan is submitted to the EPMS of node Ns. On node Ns, the meta-information about nodes Nc and Nz is analyzed, and such nodes are identified as candidates for the computation. The DAS and TAAS services on node Ns send this information to the U.A.. The user application invokes the DAS and TAAS services on the node Ns specifying the required resources: two nodes providing services for the metalearning process (a learner and a combiner/tester) and for resource reservation.

  15. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION DAS TAAS TAAS DAS EPMS DAS DAS User Application TAAS R R R R R Storage Reservation Factory NU NS Partitioner Factory NC NZ NA The EPMS invokes the factories on Na, Nc and Nz requesting the creation of a partitioner service on node Na, and the creation of two reservation services on Nc and Nz. On node Nc,computing cycles are reserved (on each computing element) to execute the learner programs, storage space is reserved to maintain the subsets extracted from DS and the partial classifiers. On node Nz, storage space is reserved to maintain the partial and global classifiers. Resource Reservation Factory Resource Reservation Factory Learner Factory Combiner Factory Database Service

  16. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION DAS TAAS DAS DAS EPMS TAAS DAS User Application TAAS R R R R R Storage Reservation Factory NU NS Reservation Service Reservation Service Partitioner Service Partitioner Factory NC NZ NA The requests made by the EPMS result in the creation of the requested services. Resource Reservation Factory Resource Reservation Factory Learner Factory Combiner Factory Database Service

  17. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION DAS TAAS User Application DAS TAAS EPMS DAS DAS TAAS R R R R R Storage Reservation Factory NU NS Partitioner Service Reservation Service Partitioner Factory Reservation Service NC NZ NA The partitioner service interacts with the database service on the same node to extract the needed subsets from DS: n training sets, a testing set and a validation set. Resource Reservation Factory Resource Reservation Factory Learner Factory Combiner Factory Database Service

  18. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION DAS TAAS User Application DAS TAAS EPMS DAS DAS TAAS R R R R R Storage Reservation Factory NU NS Partitioner Service Reservation Service Partitioner Factory Reservation Service NC NZ NA The EPMS invokes the DAS service on node Na, requesting to transfer the training sets to node Nc, and the testing and validation sets to node Nz; the learner factory on Nc, requesting the creation of n learner service instances to be run on the same node. Resource Reservation Factory Resource Reservation Factory Learner Factory Combiner Factory Database Service

  19. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION DAS Learner Serv. Learner Serv. User Application DAS TAAS EPMS Learner Serv. DAS TAAS TAAS DAS R R R R R Storage Reservation Factory NU NS Partitioner Factory Partitioner Service Reservation Service Reservation Service NC NZ NA On node Nc, n learner service instances are created. On each computing element of node Nc, the learner service instances generate the partial classifiers. As soon as each partial classifier is obtained, a notification message is sent to the EPMS. Resource Reservation Factory Resource Reservation Factory Learner Factory Combiner Factory Database Service

  20. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION DAS Learner Serv. Learner Serv. User Application DAS TAAS EPMS Learner Serv. DAS TAAS TAAS DAS R R R R R Storage Reservation Factory NU NS Partitioner Factory Partitioner Service Reservation Service Reservation Service NC NZ NA The EPMS invokes (i) the DAS service on node Nc, requesting to transfer the generated classifiers to node Nz; the combiner/tester factory on Nz, requesting the creation of a combiner/tester service to be run on the same node. Resource Reservation Factory Resource Reservation Factory Learner Factory Combiner Factory Database Service

  21. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION User Application DAS TAAS EPMS TAAS DAS DAS Combiner Service TAAS Learner Serv. Learner Serv. Learner Serv. DAS R R R R R Storage Reservation Factory NU NS Reservation Service Partitioner Factory Reservation Service Partitioner Service NC NZ NA On node Nz, a combiner/tester service is created to perform the combining and testing processes and generate the global classifier GC. Resource Reservation Factory Resource Reservation Factory Learner Factory Combiner Factory Database Service

  22. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION User Application DAS TAAS EPMS TAAS DAS DAS Combiner Service TAAS Learner Serv. Learner Serv. Learner Serv. DAS R R R R R Storage Reservation Factory NU NS Reservation Service Partitioner Factory Reservation Service Partitioner Service NC NZ NA The EPMS invokes the DAS service on node Nz, requesting to transfer the generated global classifier to node Nu. Resource Reservation Factory Resource Reservation Factory Learner Factory Combiner Factory Database Service

  23. SCIENTIFIC OBJECTIVES OPEN ISSUES FUTURE • Data privacy and security • KDD process state management • Complex processing patterns (Web Services are too simple to express distributed data mining processes and applications) • KDD Grid Service standards ( towards OGSA-KDAI ?) • KDD processes as G-Services Workflows • Asynchronous services • ……

  24. SCIENTIFIC OBJECTIVES CONCLUSIONS • The knowledge-building process in a distributed setting involves data and information collection, generation, and distribution followed by the collective interpretation of processed information into “knowledge.” • Next-generation Grids must be able to produce, use, and deploy knowledge as a basic element of advanced applications. • Knowledge-based Grids that can offer tools, components and services to support data analysis, inference, and discovery in scientific and business applications. • OGSA-based services for distributed knowledge discovery are a key element for large support of e-science and e-business.

  25. CREDITS: M. Cannataro C. Comito THANKS www.icar.cnr.it/kgrid

More Related