How to handle node failure in the cluster

I need some helps regarding how to handle the situation of one node failing in the Couchbase cluster. I have an application using .net SDK 3.2.6 and we are using at least three nodes in the cluster. If I bring down one node, I will immediately see the error of getting Cluster Map on the dead node which is expected. However, it seems to me that I will not be able to insert or retrieve the data to or from Couchbase until the failover is done on the server side. Is this true? Is there a way for client to remove the dead node from the cluster map before the failover is completed on the server side? When will the replica data become active in the cluster? Immediately or after the failover is completed? My server is using version 6.6.5.

Best,
– Dong

@DongHsu

You can configure your cluster, on the server side, with an automatic failover timeout. This will automatically do the failover you mentioned. I believe that 30 seconds is recommended, but in some cases it can go as low as 5 seconds.

The replica data will become active as soon as the failover is completed, at this point the client SDK detects the failover and will begin routing requests for the documents which previously resided on the failed node to the new active node.

If you have certain parts of your application which require a higher degree of robustness, you can use GetAnyReplicaAsync to request the document from any copy. This will send a request to the primary node and every replica node in parallel, and return the first response. This should be used with care, however, as it is possible to get an outdated version from a replica if the document has just been mutated. It also adds overhead to the client and the servers for the additional requests.

1 Like