we are using the .net client (3.2.0). When registered via the DI extensions, this automatically hooks up the ILoggerFactory.
When using the AddCouchbase() extension, the options.WithLogging() however is ignored, so there’s no way to override the loggers being used.
Reason I want to do this is due to our custom Logger using AsyncLocal and there are several parts in the SDK that create long running task or start timers where the ExecutionContext is not suppressed.
For example:
and
I believe such should be started in a suppressed context like
using (ExecutionContext.SuppressFlow())
{
_timer = new Timer(GenerateAndLogReport, logger, thresholdOptions1.EmitInterval, thresholdOptions1.EmitInterval);
}
Anyone experiencing the same problems?
For me specifically, this is causing ObjectDisposedExceptions during logging operations, causing the timers / long running tasks to crash.
Reason is, the AsyncLocal has a reference on these objects which are created in a different context (therefore context should be suppressed)
Also, potentially, there can be memory leaks in these parts of the sdk implementation as the tasks always keep references to any asynclocals that existed when the task started.
Then for the second issue, if I manually set a ILoggerFactory via the Couchbase options and use the DI extensions, my setup will be ignored. Reference:
We are definitely using ExecutionContext.SuppressFlow() in several places due to the exact problem you mentioned, seems like we have missed a few spots.
As to your logging problem, I’d like to understand it a bit better. It’s currently working as we designed it, which is to use the same logger from DI for Couchbase. Are you saying that you want to wire in a different ILoggerFactory than the one used by the rest of the application? If so, is this because of the above ExecutionContext bug? Would fixing that bug suffice, or are there other reasons?
I’ve seen indeed that in several spots the ExecutionContext is suppressed, but not in all places where it should have been done.
A way for me to work around it, would have been something like doing:
services.AddCouchbase(options => options.WithLogging(LoggerFactory.Create()));
Basically an empty factory, mitigating the problem, ignoring all Couchbase logs.
But the ClusterProvider will always override this option with the default ILoggerFactory, regardless what is set like above. I guess it would be better to only do this in case nothing is set at all.
What currently happens in my applications is that they run just fine, until the server cluster has an issue.
Then the background tasks will start logging errors/exceptions, which crashes because of disposed scopes (this of course is due to the way we do our logging, but would not have been an issue if context was suppressed)
This basically gets these tasks into an exception and stops the entire application.
There are several logging implementations using AsyncLocals, which most likely all have this problem, I presume.
Also, any AsyncLocal being set somewhere in call stacks that open Couchbase (create initial instances) will live forever in those local async contexts (kind of a memory leak as they don’t dispose)
So I think it’s kind of 2 issues, not sure you agree?:
Context suppression missing in several spots
AddCouchbase allows you to set a LoggerFactory, but it is ignored regardless
Okay, this makes sense. In your case you don’t need to override if the first issue is fixed. But, logically, it makes sense to offer the ability to override.
Also, while you’re waiting for these changes to be officially released, I do have a workaround for you. If you copy/paste the ClusterProvider code into your project, you can then change it as you need and register it on DI using services.AddSingleton<IClusterProvider, ClusterProvider>(). So long as you make this call BEFORE you call AddCouchbase it should take precedence over the internal implementation.
I’ve pushed up the fix for your first issue. For the second issue, I’ve been staring at it for a while, and I only have one explanation I’ve arrived at so far.
That particular branch of DataFlowConnectionPool is being awaited correctly, so there shouldn’t (normally) be any case where it logs after the request is out of scope. However, that branch of code is specifically applicable when there is an issue with the cluster causing connections to die and need to be reopened. This scenario would also tend to result in circuit breakers being open and needing to go half-open for testing.
My theory is that the only time you’re having a problem in DataFlowConnectionPool is actually when operations are being sent to the pool from the same branch of code in ClusterNode you identified. Therefore, I’m hoping that my fix to that logic also fixes DataFlowConnectionPool.
On another, unrelated note, I’d really love it if you could try the new ChannelConnectionPool. It gives some significant performance and throughput gains over the DataFlowConnectionPool. It’s especially noticable if you’re reading large documents, but my benchmarks show it improving across all operations. You can enable it in your configuration like this:
@btburnett3 I’ve tried out the expirimental ChannelConnectionPools.
In my test setup, I run just 1 Couchbase server. When I restart it (what I do to trigger errors regarding logging), it will however never re-connect.
Issue getting Cluster Map on server [::1]:11210!
message: "Issue getting Cluster Map on server [::1]:11210!", exceptionmessage: "The operation /488 timed out after 00:00:02.5000000. It was retried 0 times using Couchbase.Core.Retry.BestEffortRetryStrategy.", rootexceptionmessage: "[Couchbase.Core.Exceptions.UnambiguousTimeoutException]: The operation /488 timed out after 00:00:02.5000000. It was retried 0 times using Couchbase.Core.Retry.BestEffortRetryStrategy.", stacktrace: " at Couchbase.Utils.ThrowHelper.ThrowTimeoutException(IOperation operation, IErrorContext context) in D:\\dotnet\\couchbase-net-client\\src\\Couchbase\\Utils\\ThrowHelper.cs:line 74\r\n at Couchbase.Core.ClusterNode.ExecuteOp(Func`4 sender, IOperation op, Object state, CancellationTokenPair tokenPair) in D:\\dotnet\\couchbase-net-client\\src\\Couchbase\\Core\\ClusterNode.cs:line 591\r\n at Couchbase.Core.ClusterNode.GetClusterMap() in D:\\dotnet\\couchbase-net-client\\src\\Couchbase\\Core\\ClusterNode.cs:line 317\r\n at Couchbase.Core.Configuration.Server.ConfigHandler.<Poll>b__13_0() in D:\\dotnet\\couchbase-net-client\\src\\Couchbase\\Core\\Configuration\\Server\\ConfigHandler.cs:line 109"
With the normal pool, it does reconnect automatically after failures
Do you have that call stack going any further back to trace back to what was calling DataFlowConnectionPool.SendAsync? That way I can review the call sites and see if I can find some other spot we’re choosing not to await.
I’m a bit stumped by this, however I recognized that the execution context doesn’t really need to flow into CleanupDeadConnectionsAsync either, since it isn’t going to flow into sends. So I changed the code there to wrap it all in one global suppress so it applies to CleanupDeadConnectionsAsync and the ContinueWith. I also made it a bit smarter to recognize if flow was already suppressed, which could happen in the circuit breaker scenario mentioned earlier.
I considered something like that, but it has performance implications. ExecutionContext.SuppressFlow returns a structure, so it’s stack allocated. But if you cast it to IDisposable it will box it onto the heap. It also prevents inlining of the Dispose method call. Since SendAsync is such a hot path, I’d prefer to keep it on the stack.
@btburnett3
I can confirm now it reconnects correctly with the ChannelConnectionPools.
And indeed, agree with the helper suppressing context is indeed less optimal for hot code paths