360 likes | 430 Views
This report by Tony Young explores the optimization, global cost modeling, experiments, and results of benchmarking database management systems (DBMS) for communication cost analysis. It covers factors affecting communication cost, remote parameters, and global schema approaches. The findings suggest the need for comprehensive cost models and further research in this area.
E N D
Benchmarking DBMS’s for Communication Cost Analysis A Work Term Report Presentation Tony Young M.Math Candidate May 27th, 2005
Introduction • What is a federated system? • Travelocity • Remote searches of airline databases • Performs bookings, adds payment details, etc. • Google Scholar • Remote searches of ACM, IEEE, etc. databases • Presents consolidated view of papers matching common search criteria
Outline • Introduction • Organization • Optimization • Global Cost Modeling • Experiments • Experimental Procedure • Results • Conclusion • Future Work
Organization • Multidatabase Language Approach • Pass-through Querying • Global Schema Approach
Organization • Global schema approach • Burden of integration is on global DBA • Logical global schema • Functional compensation • Possibly high maintenance
Organization • Global Schema Approach Physical Org. Logical Org.
Optimization • Optimization challenges for the FDBS • Remote site autonomy • Remote parameters • Translation • Heterogeneous capabilities • Additional costs • From the perspective of the remote source, the FDBS is just another application requesting data!
Optimization • Omni module in iAnywhere ASA • Supports GS approach and pass-through querying • Performance of global queries is not as good as local queries
Global Cost Modeling • Many factors must be taken into account • Optimization Cost (OPT) • Communication Cost (COMM) • Execution Cost (EXEC) • Sub-query/Method Call Costs (SM) • Reformatting Costs (RF) • Working Cost Model
Global Cost Modeling • Interest for this project is communication cost • LS = Link Speed • S = Source/DBMS • DS = Data Size • DT = Data Type • PF = Prefetch Status • PS = Packet Size • R = Processor Speed
Experiments • Goal • Determine if communication cost can be modeled using simple network applications • Determine what factors affect communication cost • Two sets of experiments • Pure network benchmarking • DBMS benchmarking • Varied each factor mentioned previously, one at a time
Experimental Procedure • Hot cache • 30 trials • Experimental error below 5% • Parameters varied during both sets of experiments • Semantics of prefetching for network benchmarking
Experimental Procedure • Applications • DBCreate • NetBench • DBBench • ResultParse
Experimental Procedure • Recall the working cost model • Used two types of queries • SELECT * ROW • SELECT MAX(COLUMN) MAX • Ensure no indexes were created • Determining communication cost
Experimental Procedure • Recording query execution time
Experimental Procedure • Many ways to calculate • Similar overhead in both types of queries • Assumptions • Hot cache • Transfer of max() value negligible • Loop evaluation is negligible
Results • Results Table • DBMS (S)
Results • Link Speed (LS)
Results • Link Speed (LS)
Results • Data Size (DS)
Results • Data Type (DT)
Results • Prefetch Status (PF)
Results • Packet Size (PS)
Results • Server CPU Speed (CPU)
Results • Other notes • Dominant Factors • Consistency • Efficiency of Link Usage
Conclusion • Many factors need to be included in cost models • Dominant Factors • Affecting Factors • Communication cost is not a pure networking problem
Conclusion • Each DBMS is different in added overhead • Systems are consistent in overhead • Efficiency of link use could improve • Ease of control of the factors • Easily controllable • Not easily controllable • Much work still to be done!
Future Work • Collection of additional data • Generation and testing of a communication cost model • Gathering and analysis of other global cost model parameters
Acknowledgements • iAnywhere for their support • Glenn and Ivan • Support and countless questions • Mike, Anil, Ani, Dan, Matthew • Help and guidance • Mark, Scott and Dave • Hardware loans • Karim, Graham and Ian • Software help • Frank • Arranging the work term and help with the report and talk
Want More? • Check out the work term report at http://www.tonyyoung.ca/wtr.pdf
Optimization • Semijoin algorithm • Site selection • Remote reduction • Global reduction • Assembly • Minimizes communication costs • Exploits heterogeneous capabilities
Optimization • Replicate algorithm • Site selection • Data transfer • Query execution • Assembly • Minimizes query response time • Exploits varying hardware configurations
Optimization • Difference between semijoin and replicate • Assumptions made • Execution location
Optimization • Garlic • Fire access STAR’s • Fire join STAR’s • Fire FinishRoot STAR • Hybrid of semijoin and replicate algorithms • Large amount of overhead
Motivation • Proliferation of heterogeneous DBMS’s • Data sharing within organizations • Differing rates of technology adoption • Mergers and acquisitions • Geographic separation of teams
Want More? • Check out the work term report at http://www.tonyyoung.ca/wtr.pdf