Newbie questions - failover, java sdk remove/persistence

ok first off, sorry for the late answer :alarm_clock:

on to your three first questions:

  1. if the node is shutdown but not failed over, the replicas are not promoted so for the subset A of data your node was dealing with, no node can serve it.

  2. failing over a node means the replica will start managing the subset of data “A” and take over. however the cluster is now in an unbalanced state: some subsets of data are still replicated, while A is not. Reintroducing a healthy node, or just downsizing the cluster and doing a rebalance will bring back the cluster in a balanced state where every node has the same ratio of the data and the same replication factor.

  3. Auto-failover is limited, on purpose. It can only failover the first failure that happens, any subsequent failure before an ops has rebalanced the cluster will need manual intervention. I’m not entirely sure about the 3 nodes requirement for autofailover, but it kind of make sense that a 3 nodes cluster is a good minimum: 1 node can go down and the data can still be replicated once.

About the java SDK, your usage should be correct: you’re instructing to wait for the “main” node to acknowledge having written to disk, and also one of the replicas to have received the data.

However it looks like there was a slight delay in replication and the operation timed out. Maybe you have since tried to increase the timeout on the operation?

1 Like