Couchbase Rebalance large Cluster

Hi, we are running couchbase as KV and storing around 43TB of data in 16 nodes cluster, with replication factor 1.
So, whenever we need scaling up/down we require to Rebalance and rebalancing is becoming painful for us.
As we start rebalancing we have to suffer downtime of around 2 days. Our clients are not able to read or write from CB.
Is there any way to avoid this?

1 Like

Rebalance is a completely online operation. It could take long, but there should be no downtime. Can you explain what you mean by ‘downtime of around 2 days’?
Also, 5TB of data per node (including the replica data) is over what we currently recommend. How much RAM do you have allocated on each node?

Hi Shivani_g,
We are using Node of 32 GB RAM with 25GB allocated on each node.
100% Rebalancing takes approx 2-2.5 days. CB is not allowing to read-write(which leads to application downtime) until rebalancing on any of the nodes is 100% complete. As soon as the 1st node is 100% rebalanced read-write on CB is started, meanwhile rebalancing is continued on other nodes.

Is there a specific config for rebalancing without downtime?

@shivani_g, any update over this.

Based on the details you have provided it seems your memory to data ratio << 1%.
Couchbase recommends at least 10% memory to data ratio for operational (including rebalance) stability.
Do you see errors during rebalance? If you do, can you share the errors you see.

@shivani_g, is there any doc related to this also is this default feature in community version as well or do we need to configure it.

Rebalance is a community feature and there is no need to configure it. It is automatically configured. One rule of thumb to follow for a stable rebalance is that the ratio of memory to data on disk should be > 10%. If you go lower than that you can run into issues.