170 likes | 399 Views
On Replication. Yin Chen. July 2006. Overview. What is? Why need? Types? Investigation of existing technologies IBM SQL replication Sybase replication Oracle replication MySQL replication Globus DRS EGEE RMS SRB Our project Goals Solutions Features. What is replication?.
E N D
On Replication Yin Chen July 2006
Overview • What is? Why need? Types? • Investigation of existing technologies • IBM SQL replication • Sybase replication • Oracle replication • MySQL replication • Globus DRS • EGEE RMS • SRB • Our project • Goals • Solutions • Features
What is replication? • Copying of data & synchronization of updating • Is not Cashing • Client phenomenon • Only for improving response time • Is not a Backup (not automatically overwritten when the original data is modified ) • Is not a replicated system • deal with when/where to copy • Optimization (how much replica needed …) • Grow or shrink replication tree
Why we need it? • Data consolidation (central audit & analyse) • Data distribution (for branch offices) • Performance • Access efficiency (moving data near apps.) • Load balance (distributing access load) • Security (data protection) • Availability (off-line access) • Reliability (disaster recovery, avoiding single point of failure) • Data Grid (to improve availability, response time, fault tolerance) • Digital Library (copying digital doc, index … )
Replication types • Synchronous Replication: What is: updating two storages at the same time; roll back if one fails Benefits: High availability/auto fail-over/minimal data loss Usages: Disaster recover Drawbacks: Network efficiency /scalability/cost/less flexibility • Asynchronous Replication: What is: changes are captured on the primary storage and immediately / timely propagated Benefits: low cost / scalability /flexibility Usages: load balance/off-line access/access efficiency Drawbacks: data lost / network bandwidth
WebSphereInformation Integrator V8.2 • Supports multivendors DB • Admin: create replication criteria control table • Capture: use log/trigger to capture the changes temp table • Apply: scheduled apply transactions accumulated target DB • Alert Monitor: monitor and notify users • Supports: after-image copy / before-image copy (can rollback) • Allows subset/simple view/ complex joins & unions copy • Asynchronous replication, allows specifying schedule IBM Replication IBM Replication Existing technologies
Pioneer, Since 1993 • “publish-and-subscribe” approach • Replication Agent: runs on each publisher, detects changes base on logs • Replication Server: apply changes to target DBs (use pre-configured intelligent routes) • Replication Server Manager: GUI-based, manage/monitor P2P env. • Stable Queues: temporary storage of data , ensure no data is lost • Is advanced in providing high performance Sybase Replication Sybase Replication
Oracle Replications Oracle Replications • Multimaster Replication • P2P structure • Changes are pushed to every other site (synchronous/ asynchronous) • Conflicts may happen (Update conflict/Uniqueness conflict /Delete conflict) • Materialized View Replication • One master site manages several non-master sites (keep one/partial copy) • Updatable • Refresh (fast refresh/ complete refresh/ force refresh) • Hybrid Replication Multimaster Replication Materialized View Replication
Basic replication services, using a light weight Master-Slave model • The master writes updates to logs; the slave reads and executes the queries from the master’s logs • the slave checks results on both sites, replication stops if query only succeeds on one site • This simple structure can be combined arbitrarily to build complex architectures • In a slow network, it is difficult for a slave to catch up with the master – improved in 4.0 by adding relay logs • Have to lock or restart the master for initial snapshot copy MySQL Replications MySQL Replications 3. dual masters 2. one slave two masters 1. simple master/slaver 4. dual master with slaves 5. master ring 6. master ring with slaves
A client creates a request file (requested file name & target location) and sends to DRS • The Replicator checks user’s credential, and query RLI to find the LRC that contain mappings for the requested file • Also queries each remote LRC to get the physical file names, and selects a best one • Then starts RFT to transfer files. • Finally, registers the new replica to its LRC. The LRC will updates LRI to make replica visible Globus DRS Globus DRS Existing technologies
Designed for large, read-only, file replicating among heterogeneous resources • Implement File Catalogues • Replica Location Service maps replica’sGrid Unique ID to physical location • Local Replica Catalogues provides information of replicas for a single VO • Replica Metadata Catalogue maps file’s logical name to Grid Unique ID • LCG File Catalogue is used for performance issues EGEE RMS EGEE RMS Existing technologies
Enables file searching by attributes • MCAT a database system storing metadata • one or more Master daemon processes having SRB Agent running on them • The dispatcher monitors incoming requests and pass to HLRH (can retrieve metadata from local/remote MCAT) or LLRH (can retrieve data from storage) • supports synch/asynch replication, MCAT replication SRB Application DISPATCHER: monitors input port and dispatches requests to handler MCAT High Level Request Handler Remote SRB Low Level Request Handler File system drivers Unitree HPSS UNIX DBMS drivers DB2 Oracle ObjectStore Illustra Existing technologies
Our Goals • Combining DB2 SQL Replication with OGSA-DAI technologies • Grid-enabling DB2 Replication to provide a grid service interface for managing replication. • Supporting more scalable, secure, high performance data access • Extend OGSA-DAI to provide more powerful capabilities. • Explore metadata technologies
Relational Database Replication Mechanism Metadata Catalogue Data Resource Data Replica Replication Control Service GridFTP Transfer System architecture
Request Relational Database Replication Mechanism Metadata Catalogue Replication Control Service Metadata Search Engine Selector Metadata Register Data Resource Replication Target Starter Initiator GridFTP Transfer Workflows
Features • Keeping the features of relational database replication • Adding Grid’s features • Using Grid service discovery mechanism • Supporting more replication scenarios
Summary • Introduction of replication • Introduction of existing technologies • Relational database replications are advanced in flexibility, offering solutions for frequent updating, update everywhere, data conflictions… • Grid file replications are good at scalable, secure, and efficient file transferring • We studied both model and combine the two structures to gain benefits from both