240 likes | 387 Views
Prototype Tests for a Distributed File Catalogue. Andreas Joachim Peters Pierre Tissot Slawomir Biegluk Vagner Morais. Summary. Distributed File Catalogue - Architecture MySql Replication Interface Functionalities Implementation Structure Global&Local Databases
E N D
Prototype Tests for a Distributed File Catalogue Andreas Joachim Peters Pierre Tissot Slawomir Biegluk Vagner Morais
Summary Distributed File Catalogue - Architecture MySql Replication Interface Functionalities Implementation Structure Global&Local Databases Databases Structure Commands of the Global Mode Testbed ToDo list - Performance tests Summary
Architecture Central Services Site C Site A Site B Scheduler GLOBAL MODE LOCAL MODE Assign JOB Scheduler Global Com. CE GC GC Parallel queries FC FC FC FC FC command insert insert submit GC FC insert Master Slave
AliEn2 Catalogue Central Services Site C Site A Site B LFN->GUID SE DB FC SE DB SE DB GUID->PFN insert SE DB
MySql Replication Master records all write queries in the binary log Slave read the binary log from the master and run the queries locally A master can have many slaves A slave can have only one master A server can be both a master and a slave Masters and slaves can selectively filter queries: Database level Table level
MySql Replication Read access Update forwarded Slave Slave Write access Slave Master
MySql Replication • Replication settings: • Master • server-id = <id> • log-bin = <filename> • replicate-do-db = <db_name> • Replicate-do-table = <db_name.tb_name> • Slave • server-id = <id> • master-host = <hostname> • master-user = <username> • master-password = <password> • Reasons to use replication: • Load balancing • Put data closer to users • Make backups easier
MySql Replication Master Slave SSL Start Daemon Create Replica user Start Daemon Change Master Start Slave Reset Master 1 2 4 5 6 3
Interface Functionalities File operations ( mkdir, register, ls, rm, rmdir, chgrp, chown, chmod, find); Access Control List ( mkacl, rmacl ); Metaview extension ( mkmeta, mkmetatag, setmetatag, rmmeta, rmmetatag ); Global functionality ( global, lsglobal, synchglobal, globaloutput );
Implementation Structure mtpoolserver UI.pm dCat.pm SEAccess.pm DistributedDB.pm IPC Database.pm GlobalMode.pm
Global&Local Databases Master Slave gC dCat gC
Database Structure dCat Tables: {SENAME}__M0 {SENAME}__ACL {SENAME}__META {SENAME}__{MVName}__META {SENAME}__{MVName}__VIEW Slave dCat Master dcat
Database Structure Table: {SENAME}__M0 ctimetimestamp ownervarchar aclIdint pathvarchar lfnvarchar pfnvarchar sizeint gownervarchar guidvarchar(36) typevarchar(1) permvarchar(3)
Database Structure Table: {SENAME}__ACL aclIDint owner char ctime timestamp gowner char cowner char perm char(1)
Database Structure Table: {SENAME}__META namevarchar owner varchar ctime timestamp treedir varchar subqry varchar aclId int gowner varchar perm varchar(3)
Database Structure Table:{SENAME}__{MATAVIEWNAME}__META guid varchar(36) <metatag> <type> ... Table:{SENAME}__{MATAVIEWNAME}__VIEW lfn varchar guid varchar(36)
Database Structure Global Commands tables: {SENAME}__global_output sesinfo Master gC gC Slave
Database Structure Table: global_commands commandID int owner varchar gowner varchar timestamp timestamp command varchar parameters varchar
Database Structure Table: {SENAME}_global_output seID int commandID int timestamp timestamp outval int outmsg varchar
Database Structure Table: sesinfo seID int name varchar distname varchar address varchar rdbms varchar lastcommandid int
Commands of the Global Mode global - register a command to be called globally lsglobal - list registered global commands synchglobal – synchronize site to a global commands list globaloutput - print output of globally called command
Testbed Master Slave Slave Slave gC dCat dCat dCat Master dCat Master gC Master Slave dCat gC dCat Slave gC Slave pcaliense01.cern.ch login2.tlc2.uh.edu pcepalice34.cern.ch pcepalice66.cern.ch
ToDo list - Performance tests Perl client program written to test each type of operation (insert, query, delete, etc) Perform Test several times, average taken Any entries removed before test next test run Perform tests at local and central databases Some tests: Mean add time with increasing catalog size Add rate for increasing number of clients Query rate for increasing number of clients Delete rate for increasing number of clients Update rate for increasing number of clients
Summary We have implemented a distributed File Catalogue based on pure replication technology with Meta Data (schema evolution) and ACL support The FC allows an independent “local” operation mode in every site and a “global” operation mode f.e. for job scheduling Replication offers a realtime backup of the complete catalogue We have developed scripts for a fast setup of secure replication over SSL 1st performance figures look very promising (local inserts/listings ~3-4ms, replication more or less instantanious) To be done: Direct Comparison with other File Catalogues: LFC – LCG File Catalogue AliEn2 Centralized Catalogue FireMan (?) A distributed catalogue could offer large improvements in performance, scalability and site autonomy.