Couchbase 6.0 Node down

Hi,

We are doing some test with multiple nodes configuration and we are not able to understand the implemented logic.

In a 2 nodes configuration with 1 bucket and 1 replica we can see per node we have the total number of items/document in each node (active + replica). It is interested thought it is not split in a 50/50% configuration in some situations, but fair.

In this configuration if we kill one node (systemctl stop couchbase-server.service), no operation can be done to the other still alive node.
all of them return error.

{
“code”: 12008,
“msg”: “Error performing bulk get operation - cause: {1 errors, starting with dial tcp 10.17.11.202:11210: getsockopt: connection refused}”
}

Why the query (using the GUI) trying to connect to the dead node?

For the 3 nodes configuration we have a similar issue. if we kill one node, the cluster doesn’t response successfully to any operation (query, insert, etc) - same error are above, until a fail-over happens of the dead node.
Is this really the expected behaviour? We know we can configure the autofail over to 5 sec for a cluster with more than 2 node (with 2 seems no failover is performed). But still, 5 seconds of a non-responsive DB sounds a problem for us.

For a 2 nodes cluster, it seems only manual intervention to be able to using the DB again… doing a manual failover of the dead node. It is odd though when we try to do that a prompt tells us we are going to lose data!!.. Why? the other node should have all the vBuckets (Active + Replica).

Are we missing something?

Thanks

Thanks.

Doing some other test we see the statement of the replicas can be used to READ when one node is down is not happening.

In a 10 document bucket within a 3 nodes with 1 replica, if one node goes down, some of the documents are not retrievable using key. In our test, 2 of 10 returned “Internal Server Error” using GUI Document menu.
SQL Query always fails.: In this scenario

Only when failover is executed all doc are accessible. Why not allow to read from replica? I can understand preventing writting…

Waiting for the failover is a real problem for High Availability. Set it to the min failover time will drop node all the time and using a high value will make the cluster almost completely unavailable.

Writing is only available for the vBucket active in the other nodes, writting a key with a vBucket in the node down return error. We understand the hashing for the vBucket, but if a node goes down… the option to write to other node should be possible. After the node is back or failover, it can rebalance de vBuckets with the new Inserts…

Summarising: it seems when a node goes down, the cluster is mostly not usable.

or are we missing something??