I may need to have a better understanding of the Couchbase failover behavior, but here’s my situation:
We have 3 identical Couchbase server nodes. Before the upgrade to 5.1.1, we were using our own load-balancing solution to handle a node failures by simply not using the failed node until it gets back online. With 5.1.1, our solution doesn’t work anymore because the new Java client isn’t working fine when using a load-balancer address in the connection URI list. So we are using each node’s IP instead. Everything works fine. The only problem is the failover mechanism. When one of the nodes gets killed, or simply crashes, the Java client keeps sending request to the dead node. On the Couchbase web interface, we can see a message under the failing node stating that this server is not taking requests and that it can be failed-over. The Failover button is displayed and can be pressed to kill the node. But this is a manual operation. Not really a good production environment solution to have an administrator in front of a screen 24/7 to hit the Failover button in case something goes wrong. We have thousands of requests per second and 80% of these requests need to save or retrieve data from Couchbase. Since the requests keep hitting the dead node, our services get almost unusable.
I read the documentation and tried to activate the Auto-Failover option with the minimum delay of 30 seconds, but this doesn’t do anything. The documentation says that the Auto-Failover is triggered only on a minimum of 3 nodes. I don’t know if it means that 3 nodes must be alive (not my case here since I have 1 dead and 2 remaining nodes), or it means that the cluster must contain at least 3 nodes.
There’s must be a real production environment solution to set a real failover mechanism that simply removes the not responding node from the pool and stop sending requests to it until it gets back.
What are my options?
EDIT: Tried with 4 nodes and the result is the same. The failover mechanism does nothing and requests keep getting sent to the dead node.