On node failure, how long are insert/upserts blocked?

chuck.connell · September 15, 2016, 2:12pm

According to the documentation, failure of one node (in a 3 node cluster) does not bring down the cluster or block any read operations. BUT, the doc states that some write operations are blocked because each data item has one master location in the cluster, so there must be a failover and rebalance before writes to those items can proceed. (See quote below.)

My concern is how long this will take. Couchbase is not really “highly available” if some writes are blocked for hours until the DBA gets an alert about the node failure and completes a rebalance process. Some applications do more writes than reads, so this is a big problem. Questions…

Do I misunderstand the situation here?
How long does it typically take for a 3-cluster with a failed node to be fully back online (reads and writes)?
Is the problem improved by using 4-node clusters, so that a node failure still leaves 3 nodes?

We are doing mostly plain key/value reads and writes, at least for now.

“If a single node fails, the data on a node that failed will not accept writes until the node is failed over, although reads can be serviced from replicas if desired.” from http://developer.couchbase.com/documentation/server/4.5/concepts/data-management.html

geraldss · September 15, 2016, 5:57pm

30 seconds currently, and slated to improve. Not a N1QL issue, but an issue of the underlying key-value store.

chuck.connell · September 15, 2016, 6:24pm

So the failover/rebalance does not need to wait until a DBA gets paged in the middle of the night, wakes up, logs on to the admin console, and performs some manual steps?

geraldss · September 16, 2016, 3:53pm

Failover is different from rebalance. Rebalance can be planned.

chuck.connell · September 16, 2016, 4:03pm

I may have the terminology wrong. My concern is how long a document is unavailable for write() operations and whether manual (human) intervention is required before that document can be written again.

You said 30 secs maximum to the first question. (Thank you!) So I assume this means that this is an automatic process, without a person’s intervention?

And how is a Java program notified that a write has failed because the primary vBucket’s node is down? What is the error return from Bucket.upsert() or does upsert() just wait until the write can succeed?

Thank you, Chuck

geraldss · September 16, 2016, 5:08pm

Yes, failover is automatic. I will let others weigh on Java. @simonbasle?

WillGardella · September 18, 2016, 9:20pm

Hi Chuck,

As Gerald said, automatic failover is one option that’s available for handling an unhealthy node. We recently revised the section on failover in the documentation to make it clearer and more informative. You can find simple explanations of the terminology and what your options are here: http://developer.couchbase.com/documentation/server/4.5/clustersetup/failover.html

Best,
-Will

chuck.connell · September 19, 2016, 2:38pm

Thanks Will. That documentation definitely helps.

From the Java SDK, what will the calling program see when a document’s primary vBucket is on a down node?

ingenthr · September 20, 2016, 12:27am

With respect to a node that fails, the SDK will continue to try to service your request until the request times out. If the connection is re-established (the node comes back online, for instance, or is replaced with a new node with the vbucket in active state), then the SDK will send the operation and it’ll succeed.

Generally speaking, the programming model is the SDK presenting a cluster and abstracting your application away from the details of node failures and recovery. The abstraction cannot hide down nodes though, so these requests end with a timeout.

@WillGardella pointed you to the docs, but one thing you’ll note is that if there is a failover triggered, particularly a hard failover, the SDK will see the topology change and operations will go to the new location where the vbucket is “active”.

chuck.connell · September 20, 2016, 2:03pm

Thanks very much. This is enough for us to proceed now.