320 likes | 394 Views
R*: An Overview of the Architecture. R. Williams, et al IBM Almaden Research Center. Outline. Environment and Data Definitions Object Naming Distributed Catalogs Transaction Management and Commit Protoctols Query Preparation Query Execution SQL Additions and Changes.
E N D
R*: An Overview of the Architecture R. Williams, et al IBM Almaden Research Center
Outline • Environment and Data Definitions • Object Naming • Distributed Catalogs • Transaction Management and Commit Protoctols • Query Preparation • Query Execution • SQL Additions and Changes
Environment and Data Definitions • CICS as the underlying communication model • Data distribuion: • Dispersed • Replicated • Partitioned • Horizontal • vertical • Snapshot
Object Naming • System Wide Names (SWN): • USER @ USER_SITE.OBJECT_NAME @ BIRTH_SITE
Distributed Catalogs • Local site maintains objects in its database • Catalog entry may be cached • Entries are versioned
Transaction Management and Commit Protocol • Transaction number: • SITE.SEQ_NUM (or SITE.TIME) • Two phase commit (2PC)
Query Preparation • Name resolution • Authorization check • Distributed compilation • Global plan generation/optimization • Local access path selection • Local optimization • Local view materialization
Cost Model • 3 weighted components: • I/O • CPU • Message • # of messages sent • # of bytes sent
Query Execution • Synchronous vs asynchronous execution • Distributed concurrency control • Deadlock detection and resolution • Crash recovery
SQL Additions and Changes • DEFINE SYNONYM • DISTRIBUTE TABLE • HORIZONTALLY • VERTICALLY • REPLICATED • DEFINE SNAPSHOT • REFRESH SNAPSHOT • MIGRATE TABLE
R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center
Outline • Distributed Compilation/Optimization • Instrumentation • Experiments and Results
Distributed Compilation/Optimization • Issues: • Join site • Transfer methods: • ship whole • fetch matches • Cost model
Weights Estimation • CPU: inverse of MIPS • I/O: avg seek, latency, transfer time • MSG: # of instruction per msg • BYTE: effective transmission speed of network
Instrumentation • Distributed EXPLAIN • Distributed COLLECT COUNTERS • Force optimizier
Experiment I • Transfer method • Merge-scan join of 2 tables: • 500 tuples in each table • Project both table – 50% • 100 different values for join attribute • Join result: 2477 tuples
Experiment II • Distributed vs local join • Join of 2 tables: • 1000 tuples in each table • Project both table – 50% • 3000 different values for join attribute
Experiment III • Relative importance of cost components
Experiment IV • Optimizer evaluation • Accurate estimates of # of msgs and bytes sent (<2% difference) • Better estimates when tables are more distributed
Experiment V • Alternative distributed join methods: • Dynamically created indexes • Semijoins • Bloomjoins • 2 tables: • 1000 tuples for outer • Varies inner from 100 to 6000 tuples
Other Experiments • Clustered index: • Bloomjoins < Semijoins < R* • 50% Projection: • Site 1: Bloomjoins < Semijoins < R* • Site 2: Bloomjoins < R* << Semijoins • Wider join column: • Bloomjoins < R* << Semijoins