.NET SDK fails to recover after cluster automatic failover

I am following up on this issue on behalf of maximv and I am running into the same problem with timeouts in the SDK.

I have created a simple standalone ASP.NET Core app with a single GET endpoint and I am testing this endpoint with loadster. I have configured a couchbase cluster with a single bucket to take from a pool of random keys and try to Get the key, or Upsert if the key is not found.

  • My cluster is composed of 3 couchbase servers with auto-failover set to 60 seconds.
  • At the beginning of my test, response time is almost instantaneous and errors are nonexistent under a full load test.
  • A couple of minutes into my load test, I simulate a server outage by instantly killing one of the couchbase nodes.
  • I am timing how long each IBucket.Get(key) operation takes and logging when this operation takes longer than 1 second.
  • For the first 60 seconds after taking the node offline, I see a flood of timeouts happening , some taking more than 20 seconds before timing out.
  • At the 1 minute mark, failover happens.
  • I’ve waited 5-10 minutes after the failover and continued running the load test, and I still see constant timeouts happening of 20+ seconds.

This means that when there is a problem with any cache nodes going offline, the app becomes unusable.

To test the theory that this is an in-memory problem with the SDK, I waited 10 minutes before quickly closing and relaunching my ASP.NET Core app while the load test was running. Immediately I saw that the timeouts ceased to happen and the response times logged in the load test returned to near instantaneous.

Here is a snapshot of the graph of response times with the node offlining happening at the 2.5 minute mark, and the app restart happening at the 15 minute mark:

My cluster configuration is pretty simple and looks like this:

            new ClientConfiguration
            {
                BucketConfigs = new Dictionary<string, BucketConfiguration>
                {
                    {
                        "DataBucket",
                        new BucketConfiguration
                        {
                            BucketName = "DataBucket",
                            Password = "testpassword",
                            UseSsl = false,
                            PoolConfiguration = new PoolConfiguration
                            {
                                MinSize = 5,
                                MaxSize = 10
                            }
                        }
                    }
                },
                Servers = new List<Uri>(new[]
                {
                    new Uri("http://192.168.0.12:8091"),
                    new Uri("http://192.168.0.13:8091"),
                    new Uri("http://192.168.0.19:8091")
                })
            }