Simple solution with two nodes - fails entirely if one node is not available

I have a fairly small solution running on two cluster nodes. If one node fails (or is taken down for service) then the entire application fails!

Should it not be possible continue running? There is one node up and running fine - and it can just rebalance once the other node is back up and running. Should I do anything in particular in my Java code for this to work? Right now I tell the SDK that there are two nodes running in the cluster.

The same is also relevant to mobile users via sync.gateway.

Thanks in advance for any insights :slight_smile:

I’m using Couchbase Community Edition for this solution (v. 6.0)

What actually happens when “the entire application fails”? Are you seeing error messages, exceptions, etc?

Well, any access to the database is failing - with something like this:

Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException
	at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:77)
	at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:131)
	at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:126)
	at dk.dtu.aqua.catchlog.dao.CouchbaseUserDAO.loadUser(CouchbaseUserDAO.java:497)

Do you have automatic failover turned on? If so, when the 2nd node is taken down, the replicas on the 1st node should be promoted and the app should continue working. There may be a short period while this happens, and as with any data access, you should prepare for and handle exceptions. If automatic failover is NOT turned on, that could explain why you are seeing errors like this.

I’m going to move your question into the Java forum, just in case they have more insight for you there.

A very good question!

I thought I had automatic failover turned on. But can you quickly point me in the direction of the necessary steps? Then I’ll verify my settup :+1:

It is under General Settings on the UI (see docs https://docs.couchbase.com/server/6.5/manage/manage-settings/general-settings.html) under “Node Availability”. I’ve put a red box around it in this screenshot:

You can also control it with REST/cli.

Ok. I’m on version 6.0 (CE) so my node availability looks like this:
image
But I guess that should do the same?
Perhaps I should adjust the timeout?