Performance degradation after upgrading to v3.3.6

Hi, everyone
We have recently upgraded a mission-crition application to v3.3.6 SDK version and we noticed a considerable performance loss, around -50%.
We have done some profiling and we noticed that the SDK is throwing some exception for the case of DocumentNotExist server response and we are worried if this is causing the performance loss we’ve detected.
Here is the Benchmark we did. In the benchmark report you will notice a “TryGetAsync” which is a test I’ve being doing refactoring some points of the SDK, avoiding the exception throwing, for comparison.

Is it possible to provide the TryGetAsync method? Something like:
Task<(bool, IGetResult)> TryGetAsync(string id, GetOptions? options = null);
Additional reading: Exceptions and Performance - Framework Design Guidelines | Microsoft Learn

        [Benchmark(Baseline = true)]
        
        public void GetAsync()
        {
            var tasks = new Task[GetPerOperation];

            for (var i = 0; i < GetPerOperation; i++)
                tasks[i] = Task.Run(async () => {
                    try
                    {

                        await _couchbaseCollection.GetAsync("my-document-key");
                    }
                    catch (Exception e) 
                    { 
                        
                    }

                });

            Task.WaitAll(tasks);
        }


        [Benchmark]

        public void TryGetAsync()
        {

            var tasks = new Task[GetPerOperation];

            for (var i = 0; i < GetPerOperation; i++)
                tasks[i] = Task.Run(async () => {
                    try
                    {

                        await _couchbaseCollection.TryGetAsync("my-document-key");
                    }
                    catch (Exception e)
                    {

                    }

                });

            Task.WaitAll(tasks);
        }

Profiling with Jetbrains DotTrace (Ocurrencies of DocumentNotFoundException in only 1 request):


Seems that with TryGetAsync approach we recover some performance, mainly in more concurrency scenarios.

A few questions:

  1. What version of the SDK were you using previous to 3.3.6? (You said 3.6.6, so I’m assuming you meant 3.3.6)
  2. Can you provide the code for TryGetAsync?

Also, note that spooling large numbers of simultaneous operations is a known performance bottleneck in some cases, as it floods the connections and thread pool and can result in timeouts. We generally recommend limiting the degree of parallelism and you’ll get better throughput. We have started experimenting with an in-flight operation limit to reduce this somewhat, but nothing is complete yet.

Hi, @btburnett3
Sorry, you are correct, it is the version 3.3.6. The previous SDK was v2.7.11
I’ve opened a PR with the proposed solution, for discussion purposes => TryGetAsync proposed solution by kaiohenrique · Pull Request #120 · couchbase/couchbase-net-client · GitHub
But in the Benchmark, notice the exception numbers, it is alarming. Also, in this forum I saw some SDK users also worried about this exception-based implementation. Shouldn’t we reconsider this design?

But please, notice that with TryGetAsync approach, we got very good numbers. 3x faster for 200 async-get per Benchmark-iteration case is very promising. What do you think?

@kaiohenrique -

Hi, thanks for posting! Reading through the thread and a quick review of the PR (thanks!) that you pushed, I pretty sure what your seeing is a side effect in the changes that were made between sdk2 and sdk3 API.

There is some agreement about it not being the most performant way of handling errors; it doesn’t make sense in certain cases such as KeyNotFound where it’s not an exception, but possibly an expected state. The good news is that we have already refactored the internals to support something like what you did in your PR. These changes were introduced in 3.4.2 (see NCBC-2167) and do improve performance overall, but lack the support for handling the KeyNotFound case.

The idea was not to widen the API (ICouchbaseCollection in .NET), but to provide extension methods that allowed you to handle certain responses by bubbling up the status and not the exception or perhaps providing a binary response like your PR. The design here though is still in flux.

Hi, @jmorris
It is not clear for me how can I extend CouchbaseCollection. Can you please provide an example?
Also, is necessary to extend RetryOrchestrator as it is still throwing exception (when non-retryable)? [reference]

@kaiohenrique

I don’t mean providing a new implementation of CouchbaseCollection, I mean providing extension methods that provide an alternative interface which is non-breaking for existing clients. To do this, internal changes would need to be made so that only when these methods were called the behavior would change.

Yes, some exceptions are still thrown internally, especially if they are not retriable. Changing this would be a breaking change for existing consumers and isn’t an option.

@jmorris
Can we have a hotfix for that? Perhaps a new interface with a method similar to TryGetAsync?
Would be terrible for us to downgrade to 2.7 and we are in a hurry to going to production.

(post deleted by author)

I have benchmark 3.4.2 and 3.3.6 against 2.7.11 (Couchbase Server 6.6)