GetJson stops retrieving docuements

couchbase server : 3.0.1 community
.NET SDK: 1.3.9

I’m having a weird situation,
I run a 3 nodes cluster and my app is using View to get ID’s of documents and GetJson to get the actual document.
After about an hour of running or an unknown number of calls to the DB suddenly GetJson can’t find the document in the DB.
I know for sure that the document is there since i’m getting it from the View and when i do a lookup from the web interface i’m seeing the document.
It’s like after a while the map gets messed up and the client don’t know where the document is located.
when i recycle the pool or just restart the web service i’m able to get the files back. also this is only for GetJson as Views keep working when GetJson gets tired.
Anyway knows what it can be?
Thanks in advance.

@jesuson

Does this happen to a specific key, all keys or randomly? Also, there is a ExecuteGetJson method which returns an GetOperationResult object which has contains more information about the request. If you can switch out your calls with that method, you can check the Status and Message fields to see why the operation failed.

Also, if you enable logging and provide them, that will help diagnose the issue as well.

-Jeff

@jmorris
Hey Jeff,
Thank you for you answer.
This one happens for all keys.
I forgot to mention that I have a second web service that is suffering from this also but one can work when the other’s not.
So what i’m trying to say is that one app gets the document and displays it but when I’m sending the same ID to the other app that suppose to copy it into a relational DB it fails in getting it. Both work on the same server on different Application pools.

I will try ExecuteGetJson + verbose logging and let you know.
Thanks.

@jmorris
I wasn’t able to enable loggins but I got the errors from EecuteGetJson.

Statuscode: 132
Message: Exception reading response

@jmorris

updated to 1.3.10, still happening.
this is the full log with exception:

status>>>>>
Success?
[ False ],
HasValue?
[ False ],
StatusCode?
[ 132 ],
Message?
[ Exception reading response - xxx.xxx.xxx.xxx:11210 ],

Exception?
[ Unable to read data from the transport connection: A blocking operation was interrupted by a call to WSACancelBlockingCall. ],

Stacktrace?
[ at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at Couchbase.CouchbasePooledSocket.Read(Byte[] buffer, Int32 offset, Int32 count)
at Enyim.Caching.Memcached.Protocol.Binary.BinaryResponse.Read(IPooledSocket socket)
at Enyim.Caching.Memcached.Protocol.Binary.BinarySingleItemOperation.ReadResponse(IPooledSocket socket)
at Couchbase.CouchbaseNode.Execute(IOperation op) ].

@jesuson -

You are probably experiencing port exhaustion - your simply using more ephemeral ports than your OS is configured to support. You can try changing the following registry settings:

  1. Increasing the MaxUserPort setting from 5k to 65k
  2. Decreasing the TcpTimeWaitDelay setting from 240 to something lower than the default

The first one increases the number of temporary (ephemeral) ports that outgoing connections can use; the second decreases the amount of the time the socket is in TIME_WAIT when it is closed. You can read more about this here (ignore the bizspark bit): http://msdn.microsoft.com/en-us/library/aa560610(v=bts.20).aspx

Also, whenever I see this, I wonder about how many client instances are opening and if you are caching them appropriately. In general, you should create a single client instance per bucket when the application starts (Application_Start) and close that connection via Dispose when the application shuts down (Application_End). You might want to check to make sure your not creating and destroying lots of client objects.

-Jeff

Thank you Jeff,
I tried the solution and until now it’s all working ok.

I am only using one client for all the requests. Used the static client like you mention in the SDK documents.

What do you mean by:

Thanks.

@jesuson

By caching I mean, maintaining a reference for the duration of the application. If your creating a static client or using a singleton wrapper, your good then.

Good to hear that the changes seem to be working!

-Jeff

@jmorris

Hey Jeff,

I just found out this didn’t fixed the issue for me.
still getting errors that even after waiting 5 minutes the errors still happens.

25/12/2014 11:27:47 21515b63-9dcc-4fd3-9400-f58f5d765c77 snapshot status>>>>>Success?[ False ], HasValue?[ False ], StatusCode?[ 132 ],Message?[ Exception reading response - 130.211.160.162:11210 ],Exception?[ Unable to read data from the transport connection: A blocking operation was interrupted by a call to WSACancelBlockingCall. ],Exception?[ at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at Couchbase.CouchbasePooledSocket.Read(Byte[] buffer, Int32 offset, Int32 count)
at Enyim.Caching.Memcached.Protocol.Binary.BinaryResponse.Read(IPooledSocket socket)
at Enyim.Caching.Memcached.Protocol.Binary.BinarySingleItemOperation.ReadResponse(IPooledSocket socket)
at Couchbase.CouchbaseNode.Execute(IOperation op) ].

Do you have any other solution for this one?
Thanks.

@jesuson -

Checkout this post: .NET client error. Unable to read data from the transport connection

You may need to tune the cluster OS as well as the App server.

-Jeff

Thank you Jeff,
I actually already found this post and did the tweaks to the OS.

I’m really close to the deadline on this project and this is a show stopper for sure.

If i’ll switch my code to use the 2.0 SDK do you think i’ll have the same results? I know it was build from the ground up moving away from the memcache driver so i’m hoping it will be a good solution.

Thanks again.

@jesuson -

It really depends if the issue is the driver or the environment or usage. If you are doing new development, I would definitely hedge on the 2.0 client because that is future and where most all of the development is taking place.

-Jeff

@jmorris

The development is not new and i’m not so sure i can update one of my services to .NET 4.5 so easily.
I saw that the handles are not high when i’m getting this error so the MaxUserPort+TcpTimeWaitDelay did the trick, but i don’t understand why i keep seeing this errors.

Can it be related to the fact the the documents weights about 900-1500k?

Thanks.

@jesuson -

Ok, that would make the switch to 2.0 a bit difficult. Could you enable logging and provide them (I need logs from startup to exception)? Also, if you can create an example and post it to an NCBC, that would help as well. You can create an NCBC here: https://issues.couchbase.com/browse/NCBC

I doubt the doc size has anything to do with the exception.

-Jeff

@jmorris

Well i’m having trouble setting up the logging, I don’t know why.
I ended up spending the day moving all the code to the new 2.0 SDK.
I hope using the IDisposable interface will help me solve the issue. will keep you updated.
Thanks for everything.

EDIT:

2 months later the code is running flawlessly! the new SDK really made the difference.
Thank you for everything!