1 / 40

Enterprise Replication WAIUG Forum 2002

Enterprise Replication WAIUG Forum 2002. Agenda. What is Enterprise Replication How is Enterprise Replication different from HDR Internal Overview of ER Recent Improvements in Enterprise Replication (9.3/9.4) Troubleshooting Enterprise Replication. IDS Enterprise Replication (ER).

kieran-witt
Download Presentation

Enterprise Replication WAIUG Forum 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enterprise ReplicationWAIUG Forum 2002

  2. Agenda • What is Enterprise Replication • How is Enterprise Replication different from HDR • Internal Overview of ER • Recent Improvements in Enterprise Replication (9.3/9.4) • Troubleshooting Enterprise Replication

  3. IDS Enterprise Replication (ER) • Log based, Transaction oriented replication • Asynchronous, Homogeneous (IDS 7.22+ only) • Primary/Target + Update anywhere • Consolidation, Dissemination, Workload partitioning • Tightly coupled with the server • Web and command line administration

  4. ER History • Initial Release: 7.22 in 12/1996 • Version I - 7.22 - 7.30 releases • 7.30 Grouper Compression improvements • Version II (7.31 & 9.2x) • Queue and NIF redesign, Hierarchical Routing • Version III (9.3) • UDT support, Smart Blob Support, Dynamic DataSync parallelism, Replicate sets, Smart blob queuing, In-place alter to add/drop CRCOLS, Serial Col Primary Key Support, … • Version III+ (9.4) • ER/HDR support, Large transaction support, Quick Queue Recovery, Complex Type support, Performance enhancements

  5. How HDR and ER differ? Provides single primary and single secondary Allows configurable source(s)/target(s) Primary and secondary must run thesame executables and have similardisk layout Source/target do not have to be thesame Secondary restricted to report processing Allows full usage of both source/target Simple to set up and administer Setup and administration more complex Primary and secondary are mirror images Source and target can be totally different Does not support blobspace blobs Supports blobspace blobs Replication can be synchronous Replication is asynchronous Primary purpose is for highavailability Primary purpose is for data distribution HDR ER

  6. ER – how it works Global Catalog syscdr Database Regroups transaction and performs evaluation Database Target apply threads Grouper Data Synch Snoopy AckQueue Logical Log NIF Send Queue ReceiveQueue Spool Transmits Txn to targets Source Target Transmits Txn to targets

  7. ER and onstats Global Catalog syscdr Snoopy AckQueue Source Target Database Database onstat –g cat Grouper Data Synch onstat –g grp onstat –g ddr onstat –g dssonstat –g rcv onstat –g rqm Logical Log NIF Send Queue ReceiveQueue Spool onstat –g nif

  8. Regrouping Transactions Used to control how replay position is advanced Global Transaction List Ordered by begin work Open Transaction Array 10 11 12 CDRGeval TX header TX header TX header (onstat –g grp L) Update Update Insert CDRGfan ddr_snoopy Update Delete Serial ListOrdered by commit point Tx header Tx header Used to control order that replicatedtransactions are shipped to the target (onstat –g grp S) Transactions remain on the global list until they are ACKed from the target(s) or placed in stable queue TX header Commit 11 Transactions remain on the serial listuntil they are placed into the queue

  9. What is Conflict Resolution? Update-1 Update-1 Update-2 Update-2 Required for update anywhere Server A Server B Update-1 (in queue) Update-2 ? Server C

  10. Conflict Resolution • Method to determine if the current version or a just received version of the row should ‘win’ • Ignore • Row must be applied as is • Timestamp • Most recent update wins • Stored Procedure • User written stored procedure is invoked • Upserts • Requires CRCOLS (shadow columns) • CDRTIME, CDRSERVER

  11. How do deletes affect conflict resolution? When the row arrives at the targetserver, check in the delete tableto see if the row has been deleted. Update-1 Update-1 Delete Delete Rows are pruned from delete tablesonce they are no longer needed. Server A Server B Update-1 (in queue) The row has already beendeleted so how do I preventthe row from being reapplied? Place the deletedrow into the shadowdelete table Delete ? Delete Table Server C

  12. So what’s new?

  13. Improvements in 9.3 ER • Improved performance • Increased parallelism • Spooling enhancements • Eliminates most of the reasons for DDRBLOCK state • Support of user defined types (extended opaque data types) • replication enabled for Spatial Datablade 8.11 • Other • Serial Primary Key support in update anywhere • Inplace alter of CRCOLS

  14. A Fundamental Problem With transactional replication, how do I keep the target up with the source when the source has 24 processors with thousands of users doing 500+ multi-row transactions per second and still support referential integrity on the target? Oh yes – this is one of four servers replicating update anywhere around the globe.

  15. Goal – Keep the replication cost down ResumeServer SuspendServer 9.3 ER target apply is roughly 3 times fasterthan 9.21 and considerably faster than the original transactions. 9.21 9.30 Transaction Applyon Source ER Apply on Target

  16. How we did that • DataSync threads • Apply in parallel, but commit in order • Knowledgeable of referential integrity rules • Is able to serialize operations on a single row • Allows parallelism within a replicate • Apply always uses buffered logging • ACK is coordinated with a log flush • Allows parallelism to dynamically change based on characteristics of user work • Requires no configuration

  17. What’s coming down the road? (9.4)

  18. Failure Points Need to replicateshark sightings table N R R N L R N L

  19. Failure Points Can no longerreplicate shark sightings N R R N L R N L What happens if Dallas fails ?

  20. Failure Points R HDR pair N R R N L R N L Now what happens if Dallas fails ?

  21. Why can’t I use ER with HDR now? P ER Send Que HDR standard S Only one of the target servers is aware of the updated row. Sincethe HDR secondary is not awareof the row, we have data inconsistency.

  22. What we do to coordinate ER with HDR ACK P ER Send Que HDR standard S ER Send Que

  23. ER Event Coordination • Coordinated Events • Replication of transaction • ACK transmission • Spooling of send queue • Replay position advancement • DRINTERVAL • cdrHDRMonitor thread acts as coordinator

  24. SQL host changes for ER/HDR Label Type Server Service Options srv1 group - - i=1 srv1pri ontlitcp dallas port1 g=srv1 srv1sec ontlitcp memphis port1 g=srv1 srv1shm onipcshm dallas srv1shm1 srv2 group - - i=2 srv2tcp1 ontlitcp newyork cdr2 g=srv2 srv2shm onipcshm newyork srv2shm HDR Pair

  25. Quick Queue Recovery • Problem – in the past ER took too long to recover the queue. This meant that it took quite a while to get users back on the system. • Solution – Quick Queue Recovery • Separate table containing summary of each transaction in stable storage. • Allow users to connect before ER is fully recovered. • PreDDR thread monitors for log wrap until queue is recovered • When queue is recovered, PreDDR thread stops and Snoopy begins

  26. Large Transactions • Problem – Replicated transaction must be totally in memory to process. • Solution – Support of the replication of transactions that are up to 4TB large • Grouper Paging • Temporary sblob located in SBSPACETEMP • Process spooled transactions directly from the spool

  27. Other Stuff • Collection Support • Lists, sets, and multisets • Support of multiple smartblob stable queue • Some using logging and some not • Dynamic Log support for DDRBLOCK

  28. When troubles come your way

  29. SQL Host File Issues Network entry needs to immediatelyfollow the group entry! ER information in sqlhost file must becommon on all replicating servers Label Type Server Service Options srv1tcp1 ontlitcp dallas port1 srv1tcp2 ontlitcp dallas port2 srv1shm onipcshm dallas srv1shm1 srv2tcp1 ontlitcp houston cdr2 srv2shm onipcshm houston srv2shm srv1_g group - - i=1 srv1tcp1 ontlitcp dallas port1 g=srv1_g srv1tcp2 ontlitcp dallas port2 srv1shm onipcshm dallas srv1shm1 srv2_g group - - i=2 srv2tcp1 ontlitcp houston cdr2 g=srv2_g srv2shm onipcshm houston srv2shm Can Cause Errors!!! g=srv2_g

  30. CDR GC errors in the message log file New in 9.4 cdr findmsg 1 1 define replicate cdr findmsg 3 3 start replicate 05:35:42 CDR GC peer processing failed: message 1, error 40, CDR server 2 05:35:43 CDR GC peer processing failed: message 3, error 31, CDR server 2 cdr error SERVER:SEQNO REVIEW TIME ERROR site2:6 N 2001-04-26 03:54:46 40 GC operation define replicate 'rep1' failed: unsupported SQL select clause syntax cdr finderr 40 40 unsupported SQL syntax (join, etc..) cdr finderr 31 31 undefined replicate

  31. Are any servers suspended or dropped? onstat -g nif Id Name State Sent Received -------------------------------------------------------------------- 9 site4 RUN 6 15440 6031 cdr list server SERVER ID STATE STATUS CONNECTION CHANGED ------------------------------------------------------------------ site1 6 Active Local 0 site2 7 Suspend Dropped 0 Jun 11 14:38:40 site3 8 Suspend Dropped 0 Jun 11 14:38:37 site4 9 Active Connected 0 Jun 11 14:36:50

  32. Any Replicates suspended? cdr list rep REPLICATE: rep1 STATE: Active CONFLICT: Ignore FREQUENCY: immediate QUEUE SIZE: 0 PARTICIPANT: test:informix.tab1 OPTIONS: row,ris,ats,fullrow REPLICATE: rep2 STATE: Suspend CONFLICT: Timestamp FREQUENCY: immediate QUEUE SIZE: 0 PARTICIPANT: test:informix.tab2 OPTIONS: row,ris,ats,fullrow

  33. What is snoopy doing? If not advancing could meanstable queue is full or remoteserver is down. onstat -g ddr DDR -- Running -- # Event Snoopy Snoopy Replay Replay Current Current Buffers ID Position ID Position ID Position 528 132 393018 130 36f018 132 394000 Log Pages Snooped: From From Tossed Cache Disk (LBC full) 3774 1142 88

  34. What is in the queues? (onstat –g rqm) RQM Statistics for Queue (0xc379018) trg_send Transaction Spool Name: trg_send_stxn Insert Stamp: 8007/0 Flags: SEND_Q, SPOOLED, PROGRESS_TABLE, NEED_ACK Txns in queue: 8003 Log Events in queue: 0 Txns in memory: 4195 Txns in spool only: 3808 Txns spooled: 5142 Unspooled bytes: 266080 Size of Data in queue: 1116086 Bytes Real memory in use: 505056 Bytes Pending Txn Buffers: 0 Pending Txn Data: 0 Bytes Max Real memory data used: 520228 (512000) Bytes Max Real memory hdrs used 995768 (512000) Bytes Total data queued: 1116316 Bytes Total Txns queued: 8007 Total Txns spooled: 5142 Total Txns restored: 4665 Total Txns recovered: 0 real time statistics historical statistics

  35. Is the First Txn changing? Increments by onefor each buffer withinthe transaction. Server ID Unique LogID LogPos Sequence Offset In Page (3 nibbles) Page Number in log (5 nibbles) If TRG send transactionheader is multiple of 0x100000, then this is a split transaction. It iseither a timed-based or suspended replicate. Need Acks from these servers. First Txn (0xef7d308) Key: 10/523/0x003530c0/0x00100000 First Txn (0xef7d308) Key: 10/523/0x003530c0/0x00000000 Txn Stamp: 3811/0, Reference Count: 0. Txn Flags: Spooled, Restored Txn Commit Time: (1023823350) 2002/06/11 14:22:30 Txn Size in Queue: 100 First Buf's (0xf116600) Queue Flags: Resident First Buf's Buffer Flags: TRG, Stream NeedAck: Waiting for Acks from <[000c]> No open handles on txn.

  36. Problem – Who needs to ACK??? Txn (0xabbef28) Key: 1/6/0x001c9120/0x00100000 Txn Stamp: 2/0, Reference Count: 0. Txn Flags: Notify Txn Commit Time: (991335235) 2001/05/31 11:53:55 Txn Size in Queue: 84 First Buf's (0xabbefc8) Queue Flags: Resident First Buf's Buffer Flags: TRG, Stream NeedAck: Waiting for Acks from <[0004]> No open handles on txn. Server Bits in Waiting ACK bit map $ onstat -g cat SERVERS ------------------- Id: 02, Nm: serv2, Or: 0x0004, off: 0, idle: 0, state Suspended root Id: 00, forward Id: 02, ishub: FALSE, isleaf: FALSE

  37. What about the send progress tables? Progress Table: Progress Table is Stable On-disk table name............: spttrg_send Flush interval (time).........: 30 Time of last flush............: 988215679 Flush interval (serial number): 1000 Serial number of last flush...: 4 Current serial number.........: 4 Server Group Bytes Queued Acked Sent ------------------------------------------------------------------------------ 9 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0 8 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0 7 0x60001 108054 6/4/6000b8/0 - 6/5/4700b8/0 Really highestqueued to be sent!!! rqm key received (acked) rqm key sent

  38. Receive Progress Table Highest Transactionthat that has beenACKed Highest ReceivedTransaction (not yet processed) Currently processingtransaction Progress Table: Progress Table is Stable On-disk table name............: spttrg_receive Not keeping dirty list. Server Group Bytes Queued Acked Sent ---------------------------------------------------------------------------------------------------- 1 0x10001 156 1/4/1ea0b0/0 - 1/4/1ee1cc/2 Traverse handle (0xb61b1e8) for thread CDRNr1 at Head_of_Q, Flags: None Traverse handle (0xb5fc1e8) for thread CDRD_1 at txn (0xb2cd020): 1/4/0x001ee1cc/0x00000000 Flags: In_Transaction Traverse handle (0xb6091e8) for thread CDRD_1 at Head_of_Q, Flags: None

  39. How quickly are txns replicating? These times are on different machines! onstat –g rcv full Statistics by Source Server 6 Repl Txn Ins Del Upd Last Target Apply Last Source Commit 393217 4002 4000 2001 0 2002/06/11 14:39:19 2002/06/11 14:38:32 393218 2000 2000 0 0 2002/06/11 14:38:37 2002/06/11 14:22:32

  40. In Closing - Questions???

More Related