Dead connections


#1

From time to time, some of the clients looses their connection to one of the node in the cluster, this time it seems to be caused by a node being elected as master, this is from the log: “Haven’t heard from a higher priority node or a master, so I’m taking over.”. I have a tool queering the REST end point for node status continuously, and it returned status “warmup” for that node for a while, which caused NodeUnavailable exceptions killing k/v node access for a second or so.

I enabled logging hoping it could help, but does not look like it contains any information. The current level is set to INFO.

Here is a sample of the log:

2016-10-26 15:48:44,072 [184] INFO Couchbase.IO.ConnectionBase - System.ObjectDisposedException: Cannot access a disposed object.
Object name: ‘System.Net.Sockets.SocketAsyncEventArgs’.
at System.Net.Sockets.SocketAsyncEventArgs.StartConfiguring()
at System.Net.Sockets.SocketAsyncEventArgs.SetBufferInternal(Byte[] buffer, Int32 offset, Int32 count)
at Couchbase.IO.BufferAllocator.ReleaseBuffer(SocketAsyncEventArgs eventArgs)
at Couchbase.IO.Connection.Dispose()
2016-10-26 15:48:44,087 [184] INFO Couchbase.IO.ConnectionPool1[[Couchbase.IO.IConnection, Couchbase.NetClient, Version=2.3.8.0, Culture=neutral, PublicKeyToken=05e9c6b5a9ec94c2]] - Connection is dead: 15e015cc-cc83-450d-aad8-6d7074b7e16b on <server ip>:11210 - 502a7c8d-fc2b-47e1-81be-50b4c5ea56ca - [0, 19] 2016-10-26 15:48:44,072 [176] INFO Couchbase.IO.ConnectionBase - System.ObjectDisposedException: Cannot access a disposed object. Object name: 'System.Net.Sockets.SocketAsyncEventArgs'. at System.Net.Sockets.SocketAsyncEventArgs.StartConfiguring() at System.Net.Sockets.SocketAsyncEventArgs.SetBufferInternal(Byte[] buffer, Int32 offset, Int32 count) at Couchbase.IO.BufferAllocator.ReleaseBuffer(SocketAsyncEventArgs eventArgs) at Couchbase.IO.Connection.Dispose() 2016-10-26 15:48:44,087 [176] INFO Couchbase.IO.ConnectionPool1[[Couchbase.IO.IConnection, Couchbase.NetClient, Version=2.3.8.0, Culture=neutral, PublicKeyToken=05e9c6b5a9ec94c2]] - Connection is dead: 91c42731-baea-4998-b6f1-5aa16b979284 on :11210 - 502a7c8d-fc2b-47e1-81be-50b4c5ea56ca - [0, 18]
2016-10-26 15:48:44,103 [184] INFO Couchbase.Core.Server - Checking if node :11210 should be down - last: 13:35:53.2719979, current: 15:48:44.1033204, count: 1
2016-10-26 15:48:44,103 [176] INFO Couchbase.Core.Server - Checking if node :11210 should be down - last: 13:35:53.2719979, current: 15:48:44.1033204, count: 2

Whats the take on this?


#2

@jacob_michaelsen -

In this case it appears that the SDK is simply reacting (correctly) to the state of the cluster. Is this happening during normal operations (not during a swap, failover, rebalance, etc)?

-Jeff


#3

@jmorris -

Yes it was during normal operations. Is there anyway I can verify that the node was down on the server side? There does not seem to be any log entries in the web interface, besides the “Haven’t heard from a higher priority node or a master, so I’m taking over.”


#4

Perhaps @pvarley can provide some insight here?