Couchbase .NET SDK 3.x performance vs 2.7.26

Hello,

After upgrading to the 3.x SDK the performance testing shows that there is some performance degradation comparing to the results we have for 2.7.26. Probably I do something wrong (e.g. initialize SDK), therefore I prepared two simple projects which just query the cluster.

I query the cluster concurrently by 5 threads.

The cluster initialization for 2.x:

var clientDef = new CouchbaseClientDefinition
{
   Servers = [...],
   Buckets = new List<BucketDefinition>
   {
      new BucketDefinition
      {
         Name = "bucket-name",
         Password = "password",
         UseSsl = true,
         ConnectionPool = new ConnectionPoolDefinition
         {
            MinSize = 1,
            MaxSize = 3
         }
      }
   }
};

var clusterConfiguration = new ClientConfiguration(clientDef);

var serializationSettings = new JsonSerializerSettings()
{
   ContractResolver = new CamelCasePropertyNamesContractResolver(),
   TypeNameHandling = TypeNameHandling.All
};

clusterConfiguration.Transcoder = () => new DefaultTranscoder(new DefaultConverter(), new DefaultSerializer(serializationSettings, serializationSettings));
clusterConfiguration.Serializer = () => new DefaultSerializer(serializationSettings, serializationSettings);

ClusterHelper.Initialize(clusterConfiguration);

and the cluster initialization for SDK 3.x:

var clusterOptions = new ClusterOptions()
   .WithConnectionString("connection-string")
   .WithCredentials("username", "password");

clusterOptions.EnableDnsSrvResolution = true;
clusterOptions.ForceIpAsTargetHost = false;
clusterOptions.EnableTls = true;

var cluster = await Cluster.ConnectAsync(clusterOptions);

The code for testing is almost the same for both. The only difference is in the QueryItem method and how the cluster is queried (with bucket in 2.x and with collection in 3.x).

var bucket = ClusterHelper.Get().OpenBucket("bucket-name");
...
Task QueryItem(string key)
{
   var result = await bucket.GetAsync<string>(key);

   if (!result.Success)
      throw new Exception("Error");

   var doc = JObject.Parse(result.Value);
}

and for 3.x

var bucket = await cluster.BucketAsync("bucket-name");
var collection = await bucket.DefaultCollectionAsync();

Task QueryItem(string key)
{
   var result = await collection.GetAsync(request);
   var content = result.ContentAs<JObject>();
}

And the general code for the testing:

var requests = new ConcurrentQueue<string>();
for (int req = 0; req < numOfRequests; req++)
  // Fill the queue with the requests

var stopwatch = Stopwatch.StartNew();

var tasks = Enumerable.Range(0, 5).Select(async _ =>
{
   while (requests.TryDequeue(out string request))
    {
        await QueryItem(request);
    }
});

await Task.WhenAll(tasks);
stopwatch.Stop();

So this is what reports stopwatch.ElapsedMilliseconds:

NumOfRequests Old SDK New SDK
10000 3246 4017
50000 15025 18903
30084 30084 37723

Could someone please take a look and see if there is something wrong in the way I initialize the cluster? Seems like the results are slower by more than 20%

There are some new features in SDK 3 that I would recommend turning off if you’re not using them, they may account for the perf degradation you’re seeing. There are also some other perf features worth enabling. It’s worth an experiment, at least.

// Turn on the newer, more efficient channel connection pool system
clusterOptions.Experiments.ChannelConnectionPools = true;

// Turn off .NET Activity tracing and logging of slow operations
clusterOptions.TracingOptions.Enabled = false;
clusterOptions.ThresholdOptions.Enabled = false;

// Turn off tracking and logging of orphaned K/V responses
clusterOptions.OrphanTracingOptions.Enabled = false;

// Turn off tracking of operation metrics to the logs
clusterOptions.LoggingMeterOptions.Enabled(false);
2 Likes

Thank you very much for the suggestions, will try these and get back with the results.

Disabling tracing and logging unfortunately doesn’t make much difference. What really helped is activating the ChannelConnectionPools.

Num Of Requests With ChannelConnectionPools Enabled
10000 3462
50000 16796
100000 32682

This is better, but still slower than what we have for 2.x SDK. And I’m also worrying about this feature, since it’s experimental. Are there any risks of using it?

1 Like

ChannelConnectionPools is simply a different approach for distributing outgoing requests across multiple TCP connections to the server. The default method using the TPL DataFlow library while the new approach uses System.Threading.Channels. The DataFlow approach suffers from poor distribution (especially when receiving large documents) and a lot of unnecessary overhead.

As to risks, it’s certainly not risk-free. However, I’d qualify it as low-risk. We’ve been using it at my company in production for several months without any issues. At some point, I suspect it will be made the default.

1 Like

Got it, thank you.

Regarding to the performance I saw your answer on the different thread Net SDK 3 performance - #28 by btburnett3 where you confirm that 3.1.1 SDK is much faster for gets.

Do you think there is something wrong in the way I query the cluster? Or probably there are other options you set for your benchmarking when you got those results?

Because I ran the same code for 3.1.1 (before enabling channels) and the results were pretty much the same for that version.

My first guess would be the way you are limiting to 5 simultaneous threads may be the difference. SDK 3 is designed with a lot of async/await logic. This adds some overhead, meaning individual requests may be slightly slower. However, it generally increases throughput under load. So you might try increasing the threads from 5 to 10 or 20 and see what that does. The optimum number varies depending on a lot of factors. For example, if Couchbase Server is running on your local machine the optimum number will be much lower since there is no network latency.

Another thing is something obvious that still managed to bite me when I was doing benchmarking on the SDK early on. Make sure you’re compiling in release mode AND don’t attach the debugger when you run.

2 Likes

Yeah, increasing the number of threads makes the difference. The 2.x SDK isn’t scaled well with the number of threads and eventually the third one outperforms it.

Thank you very much for the help.

1 Like

@btburnett3 Hello

Noticed an interesting thing. No matter how many active threads I have, increasing the pool size for the 2.x client makes it to perform better than 3.x.

For 15 threads:

  • sending 100k requests takes about 13 sec for 3.x (with default NumKvConnections value)
  • takes about 11.7 sec for 3.x with NumKvConnections equaled to 5
  • takes about 9.7 sec for 2.x configured with max pool size equaled to 10

Thus, seems like 2.x is also scaled pretty well when the pool size is scaled as well and it looks to be more performant than 3.x.

Do we have settings in 3.x SDK similar to the MinPoolSize and MaxPoolSize from 2.x ? I guess this is a bit different NumKvConnections.

@eugene-shcherbo The equivalent settings are the NumKvConnections (minimum) and MaxKvConnections (maximum).

However, there are some other behavioral differences. In 2.x, the connection pool scales very aggressively. Basically, if all connections are busy sending requests it will immediately open a new connection. (Note: receiving requests is not a factor, just sending). In 3.x, we have a scaling algorithm based on monitoring queued request backpressure and choosing to scale up, which happens only every 30 seconds. Also, 2.x never scales down except due to connection failure, 3.x scales down based on connection idle time and never due to connection failure.

What this means is that for a short test, 3.x will probably only run the minimum number of connections. It’s tuned more towards continuous load rather than bursts. For benchmarking I manually set the min and max to be the same at the pool size I want for the benchmark.

1 Like

@btburnett3 Got it. Thanks for clarifying the differences in the pool scaling. That’s interesting.