1 / 48

Parallele und Verteilte Datenbanksysteme

Explore the use of distributed data mining for managing patients with traumatic brain injuries, collecting and analyzing data from multiple hospitals to improve treatment outcomes.

wjonathan
Download Presentation

Parallele und Verteilte Datenbanksysteme

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallele und Verteilte Datenbanksysteme Univ.-Prof. Dr. Peter Brezany Institut für Scientific Computing Universität Wien Tel. 4277 39425 Sprechstunde: Di, 13.00-14.00 LV-Portal: www.par.univie.ac.at/~brezany/teach/gckfk/300658.html

  2. Motivation Business Medicine Scientific experiments Data and data exploration cloud Simulations Earth observations

  3. The Knowledge Discovery Process Knowledge OLAP Queries OLAP Online Analytical Mining Evaluation and Presentation Data Mining Selection and Transformation Data Warehouse Cleaning and Integration

  4. Data Preprocessing Fig. 3.1

  5. EcoGRID Scetch Distributed Data Distributed Applications Distributed Datamining Reporting Bio- diversity Waste Popular Presen- tation Statistic Air Soil Flow Analysis Prediction Models Emmisions Water Geo- Statistic … Forests Common Ontology

  6. Traumatic brain injuries (TBIs) typically result from accidents in which head strikes an object. The treatment of TBI patients is very resource intensive. The trajectory of the TBI patients management: Trauma event First aid Transportation to hospital Acute hospital care Home care All the above phases are associated with data collection into databases – now managed by individual hospitals. Management of TBI patients Usage of mobile communication devices

  7. assumed Data Mining Accuracy vs. Data Size 100% accuracy sampled data size available data size

  8. GridMiner :A knowledge discovery Grid infrastructure (http://www.gridminer.org/) OGSA-based architecture Workflow management Grid-aware data preprocessing and data mining services Data mediation service OLAP service GUI Current Implementation on top of Globus Toolkit 3.2 Applications : Exploration of ecological data, management of patients with traumatic brain injuries Research exhibition available The GridMiner Project in Vienna

  9. Auf der WWW-Seite der LV Literatur

  10. Distributed Memory Architecture(Shared Nothing) Interconnection Network CPU CPU CPU CPU Local Memory Local Memory Local Memory Local Memory

  11. DMM: Shared Disk Architecture Interconnection Network CPU CPU CPU CPU Local Memory Local Memory Local Memory Local Memory Global Shared Disk Subsystem

  12. Shared Memory Architecture(Shared Everything, SMP) Interconnection Network CPU CPU CPU CPU Global Shared Memory

  13. Cluster of SMPs Interconnection Network CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU 4-CPU SMP 4-CPU SMP 4-CPU SMP 4-CPU SMP

  14. High-Performance I/O Systems

  15. Note: RAID technology is introduced in a separate scriptum.

  16. Principles of Distributed Database Systems The main literature

  17. DDBS is the union of what appears to be two diametrically opposed approaches to data processing: database systems and computer network technologies. Database systems have taken us from a paradigm of data processing in which each application defined and maintained its own data (figure follows) to one in which the the data is defined and adminstered centrally (figure follows) -> data independence (The application programs are immune to changes in the logical and or physical organization of the data and vice versa.) One of the major motivations is the desire to integrate the operational data of an enterprise and to provide centralized, thus controlled access to that data. Distributed Database System (DDBS) Technology – Introduction

  18. The technology of computer networks promotes a mode of work that goes against all centralization efforts. How these two contrasting approaches can be synthesized to produce a technology that is more powerful and more promising than either one alone? The key understanding is the realization that the most important objective of the database technolgy is integration, not centralization. It is important to realize that either one of these terms does not necessarily imply the other. It is possible to achieve integration without centralization, and that is exactly what the distributed database technology attempts to achieve. DDBS – Introduction (cont.)

  19. Distributed Database System Technology - Introduction

  20. Central Database on a Network -Example Boston Edmonton Communication Network Paris San Francisco

  21. Definition 1:Distributed database. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. Definition 2:Distributed database management system (DBMS). It is defined as the software system that permits the management of the DDBS and makes the distribution transparent to the users. A DDBS is not a „collection of files“ that can be individually stored at each node of a computer network. To form a DDBS, files should not only be logically related, but there should be structure among the files, and access should be via a common interface. The physical distribution of data is very important. It creates problems, that are not encountered when the databases reside in the same computer system. Distributed Database System (DDBS) - Definitions

  22. Transparency refers to separation of the higher-level semantics of a system from lower-level implementation issues; a transparent system „hides“ the implementation details from the user. Example (next slide): Consider an engineering firm that has offices in several cities. It is preferable, to localize each data such that data about the employees in Edmonton office are stored in Edmonton, ..., and so forth. The same applies to the project information. In this process we partition each of the relations and store each partition at a differetn site – it is known as fragmentation. It may be preferable to duplicate some of this data at other sites for performance and reliability reasons. The result is a distributed database which is fragmented and replicated. Fully transparent access means that the users can still pose queries in the same form as to a centralized system, without paying any attention to the fragentation, location, or replication of data, and let the system worry about resolving these issues. Promises of DDBSs1.Transparent Management of Distributed and Replicated Data

  23. Distributed Database System Environment - Example Edmonton Boston • Edmonton (employees) • Paris Projekte (projects) • Edmont Projekte (projects) • Boston Angestellte (employees) • Paris Angestellte (employees) • Boston Projekte (projects) Communication Network San Francisco Paris • Paris Angestellte (employees) • Paris Projekte (projects) • Boston Angestellte (employees) • Boston Projekte(projects) • San Francisco Angestellte (employees) • San Francisco Projekte (projects)

  24. Distributed DBMSs are intended to improve reliability since they have replicated components and, thereby eliminate single points of failure. The failure of a single site, or the failure of a communication link which makes one or more sites unreachable, is not sufficient to bring down the entire system. In the case of a distributed database, this means that some of the data may be unreachable, but with proper care, users may be permitted to access other parts of the dist. database. The „proper care“ comes in the form of support for distributed transactions. Promises of DDBSs2. Reliability Through Distributed Transactions

  25. A distributed DBMS fragments the conceptual database, enabling data to be stored in close proximity to its points of use. The inherent parallelism of dist. systems may be exploited for inter-query and intra-query parallelism. Inter-query parallelism results from the ability to execute multiple queries at the same time. Intra-query parallelism is achieved by breaking up a single query into a number of subqueries each of which is executed at a different site, accessing a different part of the distributed database. Promises of DDBSs3. Improved Performance

  26. In a distributed environment, it is much easier to accommodate increasing database sizes. Major system overhauls are seldom necessary; expansion can usually be handled by adding processing and storage power to the network. It may be possible to obtain a linear increase in „power“, since this also depends on the overhead of distribution. It normally costs much less to put together a system of smaller computers with the equivalent power of a single big machine. Promises of DDBSs4. Easier System Expansion

  27. Distributed database design Distributed query processing Distributed directory management Distributed concurrency control Distributed deadlock management Heterogeneous databases Problem Areas

  28. The architecture of a system defines its structure. This means that the components of the system are identified, the function of each component is specified, and the interrelationships and interactions among these components are defined. In this part we classify DBMS architectures. These are idealized views – many research and commercially available systems may deviate from them. We use a classification (next slides) that organizes the systems as characterized with respect to (1) the autonomy of local systems, (2) their distribution, and (3) their heterogeneity. Distributed DBMS Architecture

  29. Autonomy refers to the distribution of control, not of data. It indicates the degree to which individual DBMSs can operate independently. Requirements of an autonomous system: The local operations of the individual DBMSs are not affected by their participaion in a multidatabase system. The manner in which the individual DBMSs process queries and optimize them should not be affected by the execution of global queries that access multiple databases. System consistency or operation should not be compromised when individual DBMSs join or leave the multidatabase confederation. Autonomy

  30. Whereas autonomy refers to the distributed control, the distribution dimension of the taxonomy deals with data. There are a number of ways DBMSs have been distributed. We abstract 2 alternative classes: client/server distribution peer-to-peer distribution (or full distribution) Distribution

  31. Heterogeneity may occur in different forms: hardware data models query languages transaction management protocols Heterogeneity

  32. Architekturmodell

  33. Architektur von DBMS • Client - Server Architektur (nicht interessant für diese LV) • Verteilte Datenbank Architektur • Multi Datenbank Architektur

  34. Hier gibt es typischerweise einen zentralen Datenbank-Server und eine größere Anzahl vernetzter Arbeitsplatzrechner, die keine relevanten Daten speichern. Der Benutzer am Arbeitsplatzrechner sieht die volle Funktionalität des DBMS. Das System verhält sich wie ein zentrales Datenbanksystem, die Kommunikation ist für den Benutzer transparent. Client/Server Architektur

  35. Client/Server Architektur (cont.)

  36. Hier gibt es mehrere Datenbankserver, wobei bestimmte Daten auf nur einem Rechner oder auch auf mehreren (replizit) gespeichert sein können. Eine virtuelle Datenbank, deren Komponenten physisch in einer Anzahl unterschiedlicher, real existierender DBMS abgebildet werden. Transaktionen können in diesem Fall über mehrere DBMS laufen. Sammlung von Daten, die Aufgrund gemeinsamer, verknüpfender Eigenschaften dem gleichen System angehören Auf versch. Rechnern im Netzwerk verteilt sind Wobei jeder Rechner seine eigene Datenbank besitzt Autonom lokal Aufgaben abwickeln kann Verteiltes Datenbanksystem

  37. Verteiltes Datenbanksystem (cont.) - gleichzeitige Benutzung der Rechenleistung mehrerer Rechner - Engpaß in zentralen Datenbanksystemen bei Zugriff auf die Daten wird vermieden, da die Daten verteilt sind (ggf. repliziert) - Daten werden von einem Datenbanksystem verwaltet - Verteilungstransparenz - Grundlage: 4-Ebenen-Schema-Architektur

  38. Repetition: ANSI/SPARC Architecture Users External view External view External view External Schema The conceptual schema is an abstract definition of the database – it is the „real view“ of the enterprise being modeled in the database. The requirements of indi- vidual applications or the restrictions of the physical storage media are not considered. Conceptual view Conceptual Schema The internal view deals with the physical definition and organization of data. The location of data on different storage devices and the access mechanisms used to reach and manipulate data are the issues dealt with at this level. Internal view Internal Schema The external view is concerned with how users view the database. An individual user‘s view represents the portion of the database that will be accessed by that user as well as the relationships that the user would like to see among the data. A view can be shared among a number of users.

  39. Verteiltes Datenbanksystem (cont.) externes Schema 1 . . . externes Schema N glob. konzept. Schema lokales konzept. Schema lokales konzept. Schema lokales konzept. Schema . . . lokales internes Schema lokales internes Schema lokales internes Schema . . . 4 - Ebenen - Schema - Architektur

  40. Functional Schematic of an Integrated Distributed DBMS Global directory (GD/D) permits the required global mappings. Local mappings are per- formed by a local directory/dictionary (LD/D) mappings.

  41. Components of a Distributed DBMS User processor • The user interface handler is responsible for inter-preting users commands and formatting the result data. • The semantic data controller uses the integrity constraints and authorizations that are defined as part of the global conceptual schema to check if the user query can be processed. • The global query optimizer and decomposer determines an execution strategy to minimize a cost function, and translates the global queries into local ones using the global and local conceptual schemas as well as the global directory. • The distributed execution monitor coordinates the distributed execution of the user request. Data processor • The local query optimizer is responsible for choosing the • best access path (The term access path refers to the • data structures and algorithms that are used to access • data. A typical access path is an index on one or more • attributes of a relation.) to acces any data item. • The local recovery manager is responsible for making sure • that the locak database remains consistent. • The run-time support processor physically accesses the • database according to the physical commands in the • schedule generated by the query optimizer.

  42. - Ein MDBS ist ein Verbund von mehreren Datenbanksystemen. - Das Konzeptionelle Schema repräsentiert nur den Teil von Daten, den die lokalen DBMS teilen wollen. - Auf jedes DBS können lokale Anwendungen zugreifen. - Jedes DBS kann Daten enthalten, welche keine Beziehung zu Daten anderer DBS haben. Multidatenbanksystem

  43. Multidatenbanksystem GES GES GES LES LES LES LES LES LES GKS LKS 1 LKS n ... ... LIS 1 LIS n Modell mit globalem konzeptionellem Schema

  44. Multidatenbanksystem (cont.) ES 1 ES 2 ES n Multidatabase layer Local system layer LKS 1 LKS 3 LKS 2 LIS 1 LIS 2 LIS 3 Modell ohne globales konzeptionelles Schema

  45. Components of an MDBS

  46. Directory Management Strategies - Alternatives

More Related