Couchbase Cluster 2.2.0 - Data Issues Post Failover

shwelzen · February 26, 2015, 12:21am

Hi,

After Testing Couchbase 2.2.0 (Update, Read and Delete) for a few days, i still see some issues when a node from a cluster of 3 nodes goes down.

Following are the steps i perform in Java code

1> I create a cluster of 3 nodes and about 8 buckets with 2 replicas (from Admin console for now)
2> I create separate threads to upsert (1000 records), read and then delete all the records.
3> During the process of upsert, read, delete; i bring 1 node down and then receive an exception in java for an operation that was not successful.
4> When i receive the exception in java, i initiate a failover from code for the node that is down. Then i issue a rebalance from code and continuously check the progress of the rebalance from code.
5> Once rebalance is completed, i continue with the upsert / read / delete

I have being noticing issues especially for delete. After the failover and rebalance; many records throw timeout exception which is causing a huge performance delay. Is this because of a bug in 2.2.0?
If not, why are some records not getting deleted. (I observe atleast 25-100 out of 1000 records not deleting)

If i migrate to 3.0.2, will the issues be resolved?
Can someone verify the steps i need to take via code when node goes down to ensure no data loss and successful data deletion?

Thanks,
Sheldon