Hard Failover , Transparent For Client

Kelvin_Ni · January 23, 2016, 9:15am

Hi,
Recently we have tried the Couchbase Server 4.1 with Java JDK 2.2.2 to test hard failover, but it’s NOT transparent for client, is there any example codes/guide to handle such situation like other no-sql SDK?

another question is, whether the hard failover of one/two nodes is supported in cluster with 6 data nodes, replication 2?

nraboy · January 25, 2016, 5:14pm

Hey @Kelvin_Ni,

Can you provide more information on what you mean by “… it’s NOT transparent for client”?

Best,

Kelvin_Ni · January 27, 2016, 6:48am

thank you, when one or two nodes failover ocurred, the client would receive DurabilityException/RequestCancelledException, the client MUST do something to handle this to avoid data loss. but the cluster is still available for client to do query/insertion,etc, in my opinion, it’s not TRANSPARENT for client. is there any example codes/guide to automatically handle such excepptions to keep it working smoothly without any interruption, unless the cluster is broken.

how about QUESTION 2?

nraboy · January 27, 2016, 6:06pm

Hey @Kelvin_Ni,

Are you using your connection string as follows?:

CouchbaseCluster.create(List<String> nodes)

If you’re only specifying a single node to connect to and it is in failover, I can imagine those errors that you’re receiving are fair game.

Also when you say hard failover, are you doing automatic-failover? In this case only one node can have automatic-failover to prevent cascading failures:

http://docs.couchbase.com/admin/admin/Tasks/tasks-nodeFailover.html

This prevents your cluster from crumbling over the stress of a lost node.

Best,

Kelvin_Ni · January 28, 2016, 4:45pm

Yes, only one node is provided, we would try the full cluster node list provided here and try later

Yes, automatic-failover is enabled.

Network connectivity broken/OS reboot/Machine Power Off, etc, they should be treated as hard failover, right?

for such situation, even through one functioning node is down, the data loss is inevitable, right?
even two or maximum three nodes without stress failed at the same time,(vBucket Replication 3, 7 cluster nodes) the data loss is also inevitable, right?

unless, disable the automatic-failover, manually remove the functioning node or use graceful failover (assumed that no stress here),one node is down, it’s safe, like the document description:

For example:
In a 7 node cluster, if a bucket is configured for one replica, only one node can be gracefully failed over.
In a 7 node cluster, if a bucket is configured for two replica, two nodes can be gracefully failed over.
In a 7 node cluster, if a bucket is configured for three replica, three nodes can be gracefully failed over.

Kelvin_Ni · January 31, 2016, 5:32am

Any Update here? I need your confirmation.

nraboy · February 1, 2016, 10:33pm

To answer your questions:

Yes, all of those would be a hard failover
Data loss may not always happen in a hard failover situation. Maybe your data has already replicated to another node.
I’d like to think three nodes failing at the same time is unlikely. If you lose a node hopefully your cluster has enough power to handle the load of the missing node. But yes if replication didn’t happen you risk losing data from the downed nodes.

Replication is available to make data-loss very unlikely. Should a node do a hard failover, your loss would be minimal because of all the replications happening over the lifespan of when it was online.

Kelvin_Ni · February 4, 2016, 10:23am

Thank you for your great answer,