Node failure blocks Java client

simonbasle · April 3, 2015, 10:01am

The failover could be made quicker than the minimum 30 seconds, but that would mean scripting it in an external monitoring system in which you put more confidence in detecting quickly (and with a low rate of false positive) that a node is down.
This is not a concern that the client can handle…

For getFromReplica, you can and should use ReplicationMode.ALL. This way if a replica is promoted via failover but the cluster unbalanced, you won’t try to explicitly target a replica that temporarily doesn’t exist anymore.

For writing, when a node has been failed over you wont’ have your required number of replicas available until you rebalance. If you try to write with ReplicateTo.TWO during this period, it will fail because there’s not enough replicas. You may isolate this part and fallback to a ReplicateTo.ONE during the failover -> rebalance period maybe? @daschl any other idea?