310 likes | 449 Views
Scalable Distributed Database System: SD-SQL Server. Soror SAHRI Ceria, Paris-Dauphine University Soror.Sahri@Dauphine.fr Journées Académiques Microsoft Paris, April 2006. Overview. Objective SD-SQL Server Architecture Application Interface Implementation Performance Analysis
E N D
Scalable Distributed Database System: SD-SQL Server Soror SAHRI Ceria, Paris-Dauphine University Soror.Sahri@Dauphine.fr Journées Académiques Microsoft Paris, April 2006
Overview • Objective • SD-SQL Server Architecture • Application Interface • Implementation • Performance Analysis • Conclusion
Objective • Most of DBSs have distributed/parallel versions • SQL Server, Oracle, DB2… • DBSs do not provide dynamic partitioning of relational tables • All require manual (distributed) repartitioning when tables scale-up
Objective • http://ceria.dauphine.fr/CERIA-publications.html • Research Report, December 2005 • [Oracle Database 10g]
Objective • Scalable DistributedPartitioning of Relational Tables • Scalable Distributed Database System (SD-DBS) • Litwin, Rich & Schwartz • “Architecture for a scalable Distributed DBSs” • 2nd Intl. Workshop on Cooperative Internet Computing, 2002
SD-SQL Server • SD-SQL Server is the first prototype SD-DBS • Generalizes SQL Server • Shared Nothing Architecture • Embeds SQL Server 2000 (at present) • Up to 250 nodes (at present)
User/Application User/Application sd_insert sd_create_table SD-SQL ServerManagers SD-SQLserver SD-SQLpeer SD-SQLclient SD-SQLserver LinkedSQLServers T Split _D1_T _D1_T _D1_T NDBs C P S S D2 Di Di+1 D1 SD-SQL Server Architecture D1_T
…… Node1 Node2 Node3 Nodei DB1 DB2 DB1 DB2 DB1 DB2 SDB DB1SDB MDB Nodes, NDBs and SDBs
NDB Types • An SD-SQL Server NDB can be: • Client NDB • Carries only images • Application interfaces • Server NDB • Carries only the segments • Primary NDB • First created for an SDB • Peer NDB • Both functions
Scalable Tables • A scalable (distributed) table is a collectionof segments • Segments are SQL tables • A scalable table has, initially, only one primary segment • At some server or peer NDB • The number of segments in a scalable table is variable • Overflowing segments split
Splitting • A split occurs when an insert overflows the segmentcapacity • Splits produce other segments for a scalable table. • Each is located at a different NDB • Within the SDB • If there is not enough NDBs, splits dynamically append new ones
N1 N2 N3 Ni N3 sd_create_node_database N4 NDBDB1 NDBDB1 NDBDB1 NDBDB1 NDBDB1 NDBDB1 SDB DB1 SDB DB1 SDB DB1 sd_insert sd_insert sd_insert ScalableTable T Splitting sd_create_node sd_create_node_database …….
p=INT(b/2) C( S)= { c: c < h = c (b+1-p)} C( S1)={c: c > = c (b+1-p)} b+1-p p S S S1 Splitting Check Constraint? b+1 b SELECT TOP Pi * INTO Ni.Si FROM S ORDER BY C ASC SELECT TOP Pi * WITH TIES INTO Ni.S1 FROM S ORDER BY C ASC
Images • Images hide the scalable table segments • An image is a distributed updateable partitioned view of a scalable table • Union-all view with check constraints • An image presents the scalable table partitioning • It do not address any new segments resulted from a split
Images • Primary image • Resides at the creation node • Has the name of the scalable table • Secondary images • Reside at other client or peer NDBs of the SDB • Have a specific name, other than that of the table • To avoid name conflict • Scalable Views • Carry images at several levels
DB1 SDB N1.DB1 N2.DB1 N3.DB1 Primary Image Images T Scalable Table CREATE VIEW TAS SELECT * FROM N1.DB1._Ni_T CREATE VIEW TAS SELECT * FROM N1.DB1._Ni_T UNION ALL SELECT * FROM N2.DB1._Ni_T UNION ALL SELECT * FROM N3.DB1._Ni_T
Application Interface • Applications manipulate SD-SQL Server through its command interface. • SD-SQL Server commandsstart with ‘sd_’ • To distinguish from SQL Server commands for static tables. INSERT sd_insert CREATE TABLEsd_create_table
Application Interface:The SkyServer Benchmark • Use of the SkyServer DB as benchmark • http://research.microsoft.com/~gray/SDSS • Use of thePhotoObj table as a scalable table. • PhotoObjhas 158,426 tuples (about 260 MB)
Application Interface: Node Management • Node Creation • sd_create_node ‘Dell1’/* Serverby default */ • sd_create_node ‘Ceria’, ‘client’ • Node Alteration • sd_alter_node ‘Ceria’, ‘ADD server’ /* Becomes peer*/ • Node Removal • sd_drop_node ‘Ceria’
Application Interface: NDBs &SDBs Management • SDB Creation • sd_create_scalable_database ‘SkyServer’, ‘Dell1’, ‘Server’,2 /* Creates the primary SkyServer NDB as well at Dell1*/ • NDB Creation • sd_create_node_database‘SkyServer’, ‘Ceria’, ‘Client’ • SDB Removal • sd_drop_scalable_database‘SkyServer’
Application Interface: Scalable Tables Management • Scalable Table Creation • sd_create_table ‘PhotoObj (objid BIGINT PRIMARY KEY…)’, 10000 • Scalable Table Alteration • sd_alter_table‘PhotoObj ADD t INT’, 1000 • sd_create_index‘run_index ON Photoobj (run)’ • sd_drop_index‘PhotoObj.run_index’ • Scalable Table Removal • sd_drop_table‘PhotoObj’
Application Interface: Scalable Tables Management • Insert • sd_insert‘INTO PhotoObj SELECT * FROM Ceria5.Skyserver-S.PhotoObj’ • Search • sd_select‘* FROM PhotoObj’ • sd_select‘TOP 5000 * INTO PhotoObj1 FROM PhotoObj’, 500
Application Interface • See “Architecture and Interface of Scalable Distributed Database System SD-SQL Server”,IASTED-DBA 2006
Implementation • Commands are distributed stored procedures • Processed as distributed transactions at Repeatable Read isolation level • See • “ Prototyping a Scalable Distributed Database System SD-SQL Server”, WDAS 2006
Performance Analysis (1) (Q1)sd_select‘COUNT (*) FROM PhotoObj’ Query (Q1) execution time
Performance Analysis (2) (Q2)sd_select‘TOP 10000 x.objid FROM photoobj x, photoobj y WHERE x.obj=y.obj AND x.objid>y.objid’ Query (Q2) with image checking only (IC) and with image adjustment (IA)
Performance Analysis (3) • sd_select‘COUNT (*) FROM PhotoObj’ Comparison between SD-SQL Server and SQL Server
Conclusion • Scalable distributed partitioning of relational tables is now reality with SD-SQL Server • No manual repartitioning • Unlike in any other DBS we know about • Performance analysis proves • Efficiency of our design • Immediate utility of SD-SQL Server
Future Works • Management of replication, data failures.. • Non-availability of a server…. • Management of the data merging • Application on other DBMSs • Oracle, DB2….
Work partly supported by • CEE Project EGov • MS Research • CEE Project ICONS • IBM Almaden Res. Cntr., CA • HP Laboratories, CA
Thank YouQuestions? Soror SAHRI Ceria, Paris-Dauphine University Soror.Sahri@Dauphine.fr Journées Académiques Microsoft Paris, April 2006