SG-Replicate and sequence numbers

Jonathan_G · December 8, 2019, 5:13pm

I have 2 sync gateways in 2 different data centres - SG1 and SG2. Ideally I want an active \ active sync setup where any client can connect to either SG at any time.

If I use SG-Replicate to keep these 2 in sync, the sequence numbers differ from SG1 to SG2
(ie: on SG1, seq 100 = record A.
on SG2, seq 100 = record B)

Here is the use-case

Client1 connects to SG1 @ last_sequence = 100
SG1 becomes unavailable so Client1 connects to SG2 @ last_sequence = 100.
The sequence numbers on the 2 SG’s are not in sync with each other so Client1 does not get the right records back.

I can not find documentation explaining how to use SG-Replicate in order to have 2 sync gateways which replicate with each other and can be accessed by any of the clients interchangeably. Can someone please explain this to me?

priya.rajagopal · December 8, 2019, 10:15pm

SG-replicate only replicates the document bodies. It does not replicate the metadata / _sync documents (i.e sequence numbers). So when a document is pushed from source to target, it “stamps” the document on the destination with a new sequence number.

The fact that SG-Replicate does not replicate sequence numbers is a critical element of why replication works in active-active mode with SG-Replicate . If you replicated sequence numbers from source to target clusters - bad things can/will happen. You will be stomping on previously allocated sequence numbers on target clusters.

The remote checkpoint maintained by a Couchbase Lite client is target Sync Gateway cluster specific. The checkpoint is a pair of local and remote sequence numbers that at a high level specify what was last replicated from the sync gateway to the client. In other words, when you switch client from one SGW cluster to another, the client will do a checkpoint handshake with the target sync gateway and will try to pull missing changes. Since the two sync gateway clusters are in sync, the same document pulled from either cluster should not result in a conflict. I believe it will reject the duplicate document if it already exists. There will be no data loss. There will some transfer costs associated with the additional data exchange.

BTW, what is the criteria for switching client between sync gateways ? Is it geo location based ? Asking because I am curious how often you expect the client to switch between clusters? In typical real world deployments, it’s not a regular occurrence to switch clusters (unless one goes down, or client is traveling etc) - The the redundant data exchange is not much of an issue.

Jonathan_G · December 9, 2019, 8:11am

That is super helpful. Is this behaviour documented anywhere? I could find nothing that describes what happens when a client switches from on SG to another.
In my case, the users will only switch SG’s if one if them is inaccessible so it shouldn’t be a regular occurrence.

I’m not understanding the handshake part, is this handled by the couchbase lite library or by the SG? The reason I ask is because we are using PouchDB to connect to the SyncGW and I want to understand whether this will still work correctly…

priya.rajagopal · December 9, 2019, 4:07pm

I’m not understanding the handshake part, is this handled by the couchbase lite library or by the SG? The reason I ask is because we are using PouchDB to connect to the SyncGW and I want to understand whether this will still work correctly…

Ah, You are referring to PouchDB client. Would have helped to clarify that upfront. I am not familiar with how PouchDB manages replications across target endpoints . You should look up PouchDB documentation. As far as Sync gateway compatibility with PouchDB, check out this post

Aside : What I described is for Couchbase Lite replicating with Sync Gateway. A high level overview of web sockets replication protocol is documented here. Note that the 2.o version of the protocol is different from the HTTP based version in CBM 1.x (which was compatible with CouchDB protocol used by PouchDB)