Fail over failing


#1

Hi all,

We run 4.5.1-2844 Community Edition (build-2844), and java-client-2.4.2.jar. At the moment we have a cluster of 2 servers on AWS.

1 of the servers suddenly became unstable and so far we are not able to understand what exactly happened with the instance itself. Nevertheless, CB clients failed in all other instances with the same exception:

java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:73)
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:119)
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:114)

Here is the log message in CB management console:
“Could not auto-failover node (‘ns_1@xxx.xx.x.xxx’). Number of nodes running data service is 2. You need at least 3 nodes.”

Does this mean that the minimum cluster configuration for the fail-over mechanism to properly work on both server AND client side is:

  • 3 servers
  • 1 replica

Kind regards,
Alex


#2

The number of replicas doesn’t matter, but for auto-failover to be enabled you must have three nodes in the cluster. Have a look at the documentation, which discusses this point.

All of the clients will fail for the missing data items until you failover (manually) the down node. The clients will continue to work for the other data items, but with 50% of your data unavailable, it’s possible it’ll seem completely down because it’s constantly going through a 2.5s/75s timeout trying to access one node.