430 likes | 538 Views
Tutorial of Course Project: Distributed Query Engine. Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn. Outline. Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A. Outline. Requirements Benchmark
E N D
Tutorial of Course Project:Distributed Query Engine Jun Wang(王军) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn
Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB
Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB
Database Management • Compulsory Commands • SELECT • Fragmentation • Horizontal Fragmentation • Vertical Fragmentation DDB
Architecture • P2P Architecture DDB
Query Processing • SELECT statement • One table & multi-tables (JOIN) • Types of operator in the predicate: >,<,= • Command Parsing • Query Processing • General query tree • Query tree optimization and reduction • Network traffic optimization DDB
User Interface • The user should be able to use the interface to interact with your Distributed Query Engine • Any type of interface • Command Line Interface • Application-based Interface • Web-based Interface • Note: DO NOT focus on the interface design. The interface meets the requirements if: • Let users input the commands • Display the results and additional evaluation metrics DDB
System Outputs • The size of query result set • The optimized query tree • The time cost of query • The communication cost of query DDB
Documentation and Report • Mid-term presentation • Design of the distributed database query engine • Project work plan • Final report • Architecture • Query optimization method • Implementation of communication protocols • System operation specification • Instruction of installation, configuration, and operation of the query engine DDB
System Evaluation • Demonstration Time • 16th Week • SystemTest Environment • Operating system: Windows • Local DBMS: MySQL DDB
Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB
Dataset • We simulate a scenario of using distributed database systems. • In general, the followings are provided: • The schema of a database (global tables) • The fragmentation schemes • The allocation DDB
Dataset DDB
Fragmentation • Horizontal Fragmentation • Vertical Fragmentation DDB
Allocation DDB
Overview of Query Processing • Decomposition and Localization • Rewriting: a query an Algebra tree • Reduction • Optimization • Optimize the cost of data transfer • Execution • Intermediate table storage and access • The TOTAL response time after the user issues a query DDB
Decomposition and Localization • Evaluation Points: • The elimination of useless fragmentations and joins • The global optimization of algebra tree • Example: DDB
Optimization • Evaluation metric: The amounts (Bytes) of data transfer • You should provide the following information: • The execution plan, where all operations as well as data transfers should be listed in sequence. • The amounts of each data transfer and the sum of amounts of all transfers. Note that the amounts of data transfer is measured by data BYTES before compression (you can compress the transferred data if it is necessary). DDB
Execution • Evaluation metric: total response time • Total response time is the sum of • Time of input receiving • Time of query processing (decomposition, localization and optimization) • Time of result display DDB
Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB
Communication Protocols • Access Level • Client-Server Protocols • Server-Server Protocols • How to Design Communication Protocols • Sync vs. Async • Design of commands and responses • How to implement Communication Protocols • Strong vs. Economy • Techniques DDB
Database Management • Global vs. Local • Global Management • Local Management • GDD • Global Information of DDB • Storage Issues • Local DBMS Recommendation • MySQL DDB
Query Processing • Master site • Optimize the query • Formulate execution plan • Broadcast the plan • All sites • Execute commands from Master site • Return results B commands A Client C D • The Crucial Points • Global Optimization • Global Execution Formulation DDB
Other Issues • SQL Statement Parser • Multi-Thread Mechanism • Query Tree Layout and Visualization DDB
Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB
Demo For References Only • Authors: • Shoubin Kong 孔守斌 • Jun Wang 王 军 • FangQiang Yu 余芳强 DDB
Implementation Details • Programming Language: Java • Local DBMS: MySQL • Protocol: RMI DDB
An Overview of the System Client End Client User System Communication Protocols Server End DDBMS DDB
Deployment • Client: 127.0.0.1 • Site server 1: 127.0.0.1:40001 • Site server 2: 127.0.0.1:40002 • Site server 3: 127.0.0.1:40003 • Site server 4: 127.0.0.1:40004 DDB
Database Initialization • Use your self-defined commands to initialize the database: • Define the 4 sites over 4 servers • Create the database • Create the tables • Fragment the tables • Allocation each fragmentation to sites 2014/12/1 DDB 31
Commands Define site Create table Fragment Allocate Import Insert / Delete Select 2014/12/1 DDB 32
Summaries • Requirement Driven • Perfect vs. Good Enough • Comparative Advantage • A Central Management Scheme to a Distributed Project DDB
Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB
Assignment : Fragmentation • Q1: Select SNO from PARTA, SUPPLY Where PARTS.PNO = SUPPLY.PNO and PARTS.PRICE<6000 • Q2: Select SNAME, PNO from SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO and SUPPLIER.COUNTRY = “USA” • Q3: Select SNO, SNAME, COUNT(*) FROM SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO group by SUPPLIER.SNO DDB
Assignment : Fragmentation • The Set of Complete and Minimal Simple Predicates {PRICE < 6000, PRICE ≥ 6000, COUNTRY = “USA”, COUNTRY ≠ “USA” } DDB
Assignment : Fragmentation • PART – Horizontal Fragmentation • PARTS1 = σprice<6000PARTS • PARTS2 = σprice≥6000PARTS DDB
Assignment : Fragmentation • SUPPLIER – Horizontal Fragmentation • SUPPLIER1 = σcountry=“USA”SUPPLIER • SUPPLIER2 = σ country≠ “USA” SUPPLIER DDB
Assignment : Fragmentation • SUPPLY – Derived Fragmentation • SUPPLY 1 = (SUPPLY SUPPLIER1) PARTS1 • SUPPLY 2 = (SUPPLY SUPPLIER1) PARTS2 • SUPPLY 3 = (SUPPLY SUPPLIER2) PARTS1 • SUPPLY 4 = (SUPPLY SUPPLIER2) PARTS2 DDB
Assignment : Allocation • a) Solution1 DDB
Assignment : Allocation • a) Solution2 DDB
Assignment : Allocation • b) Solution DDB
Q & AThank You! CLUE