Replica vs PersistTo / ReplicateTo

Hi,

On the Couchbase server which we can enable Replicas when configure the bucket.

On the other hand, we also have option to add PersistTo / ReplicateTo flags on the bucket operation in SDK. However this will introduce overheads (latency) for the high traffics from updates.

My understanding of PersistTo / ReplicateTo flags is to ensure the data has successfully populated to another node but if we do not add these flags and only depends on Replicas option on Couchbase server, would we still get a consistent data among multiple nodes? If current server is down then we might loss the latest update data since it’s kept in the RAM and not yet synchronized with other nodes?

Activating replication on the server side will instruct the server that each mutation should be replicated. The server will do that in parallel with the persistence to disk, as soon as the data as been received in RAM (push to write-to-disk queue + replication queue).

The ReplicateTo and PersistTo parameters are durability constraints that change the acknowledgement behavior of the server, not the persisting behavior:

  • by default, without such indications, the server will the operation is done as soon as it has been written in RAM
  • with a ReplicateTo constraint, it will wait for the specified number of replicas to acknowledge they’ve received the update in RAM as well before answering “done”
  • with a PersistTo constraint, it will additionally wait for the date to be stored on disk (either only on the MASTER or on replica servers as well).

Adding durability constraints improves the fault-tolerance of your application (your data has been replicated / persisted in case a node crashes), but you pay a performance hit.

So yes, you’d get an eventually consistent data among multiple nodes, unless there is a crash. Master node crashing for a specific key will lose the data if it hasn’t been replicated nor persisted. Note that in couchbase, each node is both a master node and replica node, the data is hashed and distributed evenly between all nodes.

1 Like

Could i think persistTo or replicaTo is an observer option that observe the data if
is complete persisted or replicated ?

@hubo3085632 yes you can think of it like that

Hello Simon,

Please let me know if my questions are not clear for you. I will rephrase it if required.

You mentioned that "Master node crashing for a specific key will lose the data if it hasn’t been replicated nor persisted. "

Consider following scenario.

Instead of Master node, if the replication node is crashed before applying the replica.
My undrstanding is that as the master node didnt crashed, the data will get persisted to disk.

What will happen to the replica data in following cases:

(i) If the crashed replication node is brought back before a failover is triggerred; will the data gets replicated? Till this time, will the replica data waits in the replication queue of Master node? If so, is there any threshold for how long the data can be preserved in the replication queue?

(ii) Consider before the replication node crashed, the replica data got transferred from the replication queue of Master. Just before it got applied to the replication node it crashed. In this scenario, there is no data in the replication queue to apply to replication node when it comes back. When the replication node is back and running(before failover), it will be part of the cluster. But how we will come to know that replica of the specific data is not present in the cluster? Will couchbase discovers automatically that replica for the master data is not present in the cluster?

Thanks and Regards
Pgc