What is the difference between Fail Over and Remove?

Practically I would like to know the difference between Fail Over and Remove. If one server is acting up I would probably “remove” it from the cluster and then when the problem is resolved reboot the server and add it back into the cluster. Doesn’t “Fail Over” do the same thing? I would think that it removes the server from the cluster and like remove you fix the problem and bring back into the cluster. What are the considerations when looking at the Web Console “Fail Over” and “Remove”?

1 Like

There is a difference. When you remove a node the cluster manager will move partitions of data off of the node being removed one at a time and place them on other nodes in the cluster in a way that will minimize the risk of data loss and also make sure the data is spread equally across the cluster. While this moving is happening partitions of data that have not been moved to another node will still be read from the removed node. This means that your cluster will have more resources during the move since you are still using the node your trying to remove.
When you failover then node that is being removed is immediately marked dead and replica partitions on the other servers in the cluster will become active immediately. When this happens you will have one less replica for the failed over partitions and you data may no longer be spread evenly across the cluster. The second step of a failover is to rebalance and that will fix the issue I just mentioned.
In general you should never failover a node unless that node is down or unhealthy enough that you can’t remove it and rebalance.

1 Like