Couchbase index definitions mysteriously lost after delta recovery + rebalance

index

#1

Hi, we’re using Community Edition 5.0.1 build 5003, and we notice a problem pretty frequently where one or more replicas of an index will stop replicating (the items_remaining metric goes up to ~1M or so and stays there, drain rate at 0) so we have to perform a graceful failover, restart couchbase, and then delta recovery + rebalance the node back into the cluster. This usually solves the problem, as the restart is a sufficient kick in the pants to get CB to realize it needs to start replicating to that index again. However, sometimes all the index definitions that were on that node mysteriously disappear.

From the documentation:

Index nodes retain their index definitions when being failed over and recovered using delta recovery, although the data within the indexes is deleted. When they are re-added to the cluster, the indexes will be automatically rebuilt by the Index service.

What we’re seeing is that all index definitions on the node have disappeared when we perform our delta recovery. They’re not re-built automatically so we have to manually rebuild them.

Any ideas why that might be happening? It’s only some of the time, and we’ve only seen it on certain hardware configurations. I can provide more details of the hardware if needed but I’m a little wary of that actually affecting things because we’re running Couchbase inside Docker. It seems like it could be a configuration issue since we’ve recently switched to a new cluster, but I’ve gone through our configs and 1) it looks exactly the same as our old cluster and 2) I’m not aware of any config setting that would even remotely affect index builds after graceful failover.

Could this be a bug? Is there a configuration setting I’m missing? Has anyone experienced this before, and if so, what can we do to fix it?

Thanks!

Edit: We found that this problem happens more frequently when we actually restart the docker container that Couchbase is running inside of, and can be avoided as long as we don’t restart it. However, restarting is sometimes necessary to do certain types of maintenance. Just thought I’d add this as a bit of additional information.


#2

Requesting @deepkaran.salooja for inputs.


#3

@janpaulb, index replicas are all master copies and do not replicate from each other. If items remaining goes high, it means that the index replica is not able to keep up with data service. As the drain rate is 0, it could be a storage related issue. Which storage mode are you using? CE or EE?

With delta recovery + rebalance, the server will recover all the indexes on that node. If you can share the indexer logs of the node after you have lost indexes, we can investigate it further. I don’t think it can be configuration related. The behavior your describe matches with what would happen if you would rebalance out a node after failover and then rebalance it in.