1 / 16

Chapter 1.3: Data Models and DBMS Architecture

Chapter 1.3: Data Models and DBMS Architecture. Title: Anatomy of a Database System Authors: J. Hellerstein, M. Stonebraker Pages: 43-95 . Anatomy of a Database System. Problem Problem Statement Why is this problem important? Why is this problem hard? Approaches

burt
Download Presentation

Chapter 1.3: Data Models and DBMS Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1.3: Data Models and DBMS Architecture • Title: Anatomy of a Database System • Authors: J. Hellerstein, M. Stonebraker • Pages: 43-95

  2. Anatomy of a Database System • Problem • Problem Statement • Why is this problem important? • Why is this problem hard? • Approaches • Approach description, key concepts • Contributions (novelty, improved) • Assumptions

  3. Problem Statement – DBMS Architecture • Given • A data model • Platform, i.e. operating system, computer hardware architecture • Find - An DBMS architecture • A set of building-block components • Interactions among building blocks • Objectives • Efficiency, Scalability • Extensibility • Constraints • Relational Data Model

  4. Why is this problem important? • Why review Relational DBMS architectural innovations? • Backbone of infrastructure applications • Banking, airline reservation, medical records, CRM, SCM, … • Well-understood point of reference for • New extensions and future revolution • Architecture allows • Analysis of properties • Availability, fault-tolerance, reliability • Mapping of multiple views • User requirements to components - validation and acceptance tests • Software developers, maintainer, … • Software operational support group

  5. Why is this problem Hard? • Complexity • Mid-1970s – Efficient implementation of a Relational DBMS • Declarative Query Language • Logical and physical independence • Changes • Platforms evolve • Computer Hardware, Languages, Operating Systems • Storage: Tapes  Disks (1960s)  RAID (1990s)  SAN … • CPUs: Mainframe  Mini  Desktops  Multi-core CPUs (2000s) • … • Integrate many views • Enterprise – performance level, transaction reliability, … • Data Processing Needs – data warehouses, reports, OLTP, Web,… • …

  6. Contributions, Validation Methodology • Contributions • A simple yet relatively comprehensive RDBMS architecture • Decomposition into 4 components • Identification of depedencies • Validation • Ability to explain academic and commercial RDBMSs • Expert opinion, authors have architected multiple DBMSs

  7. Proposed Approach • Four Components (Figure 1, pp. 44) • A Process Manager • Query Processing Engine • Transactional Storage Subsystem • Shared Utilities, e.g. Disk space management • Interactions among components • Not explicit in Figure 1 • Implicit: • Left-top to lower-right flow

  8. Component 1 – Process Manager • Responsibilities - Organization of processes • Platform: Uni-processor, High-performance OS threads • Two Options • Process per user (connection) • Issues - scalability • Server Process (+ I/O Process per disk) • Dispatcher thread, log manager thread • Pool of worker threads • Shared data (e.g. log, I/O buffer) in common heap space • Issues – asynchronous I/O, protection across threads, … • Client – Server communication • network socket • Q? What is new in this paper relative to Parallel Database paper by DeWitt et al.?

  9. Component 1 – Issues • Mapping DBMS threads to OS Processes • Absence of OS threads – page 50 • Commercial examples – last para, sec. 2.2.1, page 51 • Parallelism (Figures 5-7, pp. 52-54) • Shared memory – previous architectures port easily • Shared nothing • Query processing parallelizes w/ horizontal data partitioning • 2 phase commit need communication • Partial failure • Shared disk • Distributed lock manager, cache coherency protocol, … • Admission Control • Avoid thrashing ( working set > memory buffers) • Control number of connections, number of queries

  10. Component 2 – Query Processor • Responsibility: • SQL query  execution plan (Fig. 8, pp. 64) • Subcomponents • Parsing and Authorization • Catalogs • Query rewrite – views, constant expressions, semantic optimization, sub-query flattening • Optimizer – plan space, selectivity estimation, search, parallelism, extensibility, auto-tuning, … • Executor – iterator model (Figure 9, pp. 68) • Q? What is new in optimizer since Selinger ?

  11. Component 2 – Query Processor Issues • Data Modification Statements • Plans are more complex • Ex. Halloween problem (Fig. 10, pp. 71) • Access Methods • Unordered files, B+-tree, R-tree and bit-map indexes • API methods – init(), get_next(), … • Search by logical conditions (sarg) or record-id • Interacts with concurrency and recovery sub-components

  12. Component 3 – Transactional Storage Manager • Responsibilities – ACID properties • Subcomponents • Lock Manager • Serializability, 2PL, Isolation levels (p. 76) • Log Manager • WAL – 3 rules (p. 78), performance tuning • Buffer pool • Access methods • Latches in B+trees (p. 80) – conservative, latch-coupling, right-link • Predicate locks – next-key locking

  13. Component 3 – Transactional Storage Manager • Interdependencies among subcomponents • Lock Manager, Log Manager • WAL assume strict 2PL (p. 82) • Q? What would happen without strict 2PL ? • Concurrency control, Access Methods • Methods are unique to index types

  14. Component 4 – Shared Utilities • Sub-components • Memory allocator (p. 84) • Disk management subsystem • Map tables to devices or files • New issues with RAIDs (p. 86-87) • Replication services • Physical, trigger based, log-based • Batch utilities • Optimizer statistics gathering, backup/export, physical reorg and index construction

  15. Summary • Paper’s focus • DBMS Architectures – components and dependencies • Insights - Four Components (Figure 1, pp. 44) • A Process Manager • Query Processing Engine • Transactional Storage Subsystem • Shared Utilities, e.g. Disk space management • Interactions among components • Not explicit in Figure 1 • Q. List a few discussed in the paper!

  16. Assumptions, Rewrite today • Assumptions • Focus on Relational DBMS • Centralized DBMS (Recall T2.6 on R*) • Four component architecture reminds one of Ingres! • Lessons translate over to new domains • Rewrite today • Cover a post-relational DBMS, e.g. Stream or XML • Illustrate how lessons translate over web-services, e-mail repositories, network monitors, etc.

More Related