SendTimeoutExpiredException does not close the connection

mdomashchenko · December 3, 2019, 9:42pm

Any exception from Socket.Send() closes the connection:

couchbase/couchbase-net-client/blob/3f4213abd92dc4d59c00764ff0d00f06ed4ff26a/Src/Couchbase/IO/MultiplexingConnection.cs#L125-L128


catch (Exception e)
{
    HandleDisconnect(e);
}

SendTimeoutExpiredException does not do that.

github.com

couchbase/couchbase-net-client/blob/3f4213abd92dc4d59c00764ff0d00f06ed4ff26a/Src/Couchbase/IO/MultiplexingConnection.cs#L140




    var didComplete = state.SyncWait.WaitOne(Configuration.SendTimeout);
    var response = state.Response;


    _statesInFlight.TryRemove(opaque, out _);


    ReleaseState(state);


    if (!didComplete)
    {
        throw CreateTimeoutException(opaque);
    }


    return response;
}


/// <summary>
/// Gets a <see cref="SyncState"/> object if one exists in the pool or creates and returns a new one.
/// </summary>
/// <returns>An <see cref="SyncState"/> object representing the state of the request.</returns>
private SyncState AcquireState()

This leaves the connection in the pool for quite a while. In our case the socket never recovers. Never throws an exception either. So, the connection stays in the pool and keeps causing timeout exceptions.

It’s pretty urgent issue here. Could you suggest any workarounds that I can implement in my code until you fix it?

jmorris · December 3, 2019, 10:40pm

Hi @mdomashchenko -

Thanks for reporting, this does seem to be a bug; I am surprised this wasn’t identified sooner. I created a ticket NCBC-2200 for tracking, a fix should be in one of the next two releases.

-Jeff

mdomashchenko · December 4, 2019, 1:53am

@jmorris, I do understand releases and such, because I have my own and I need a workaround.

Any way I can supply my own socket factory through configuration object?

jmorris · December 4, 2019, 3:12am

@mdomashchenko -

It might be easier to simply pull from Gerrit and verify the patch: http://review.couchbase.org/c/118836/

You can configure the SDK to use a custom IConnection implementation or pool, but it’s fairly tricky. I would suggest pulling from above (you’ll need to create an account) and then seeing how well it works. Code in Gerrit is pre-merge into Github so it’s only partially tested.

-Jeff

mdomashchenko · December 4, 2019, 2:10pm

@jmorris, checked the patch, that’s just one place. There are more variations of Send() in that file and they all seem to have the same bug, including async versions

simopala · March 3, 2020, 9:14am

we are having the same issue above. we updated the client to version 2.7.16, but we still have the same errors and client doesn’t handle the disconnection:

The operation has timed out. [“s”:“kv”,“i”:“11ba6”,“c”:“296b20a52cd96057/6a54c9fffca3f628”,“b”:“pricing”,“l”:“10.70.2.11:52232”,“r”:“10.70.4.22:11210”,“t”:15000000] ckey: Cruise.Domain.Rule.Model.SaleRule68_DEP
Couchbase.IO.SendTimeoutExpiredException: The operation has timed out. [“s”:“kv”,“i”:“11ba6”,“c”:“296b20a52cd96057/6a54c9fffca3f628”,“b”:“pricing”,“l”:“10.70.2.11:52232”,“r”:“10.70.4.22:11210”,“t”:15000000]
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Cruise.Cache.Base.Cache.CouchbaseCache.d__19`1.MoveNext()

As mdomashchenko said, can you check that the bug is resolved?

jmorris · March 4, 2020, 3:48am

simopala:

we are having the same issue above. we updated the client to version 2.7.16, but we still have the same errors and client doesn’t handle the disconnection:

The operation has timed out. [“s”:“kv”,“i”:“11ba6”,“c”:“296b20a52cd96057/6a54c9fffca3f628”,“b”:“pricing”,“l”:“10.70.2.11:52232”,“r”:“10.70.4.22:11210”,“t”:15000000] ckey: Cruise.Domain.Rule.Model.SaleRule68_DEP
Couchbase.IO.SendTimeoutExpiredException: The operation has timed out. [“s”:“kv”,“i”:“11ba6”,“c”:“296b20a52cd96057/6a54c9fffca3f628”,“b”:“pricing”,“l”:“10.70.2.11:52232”,“r”:“10.70.4.22:11210”,“t”:15000000]
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at Cruise.Cache.Base.Cache.CouchbaseCache.d__19`1.MoveNext()

What makes you think that the connection was not closed and recreated after the timeout? TBH, in the issue the connection was closed, it was just a few milliseconds later in the calling method. The fix just ensured that it was closed sooner.

simopala · March 4, 2020, 7:44am

Because I have a lot of logs similar to the previous one, and if I don’t restart the application, the connection doesn’t work. We did a workaround, closing and reinizializing the ClusterHelper as soon as we are catching a SendTimeoutExpiredException.

jmorris · March 4, 2020, 8:00am

@simopala why are the connections closing? The SDK only reacts to the connection being closed; i.e. what is closing the connection and why?

simopala · March 4, 2020, 8:15am

that’s what I’m trying to understand

jmorris · March 6, 2020, 9:04pm

@simopala -

You’ll need to do some diagnosis; the SDK Doctor should be able to help you. Assuming the the timeout are consistent and reproducible, the doctor may help with the diagnosis. If nothing surprising is found, you can probably assume it’s ephemeral or random. From here I would check to see if something on the network is closing the connection (common in cloud environments) and/or TCP Keep-Alives are enabled and the ClientConfiguration.TcpKeepAliveTime and/or ClientConfiguration.TcpKeepAliveInterval are tuned to the environment.

That being said, connections could be closed locally, by the OS, network appliance or the server itself, so its a matter of isolating the cause as timeouts are a symptom. Furthermore, I suspect if you look into the logs you’ll see that connections are being recreated after the timeout.

-Jeff