I’m using the latest .net sdk v2.x with CB Server v3.x. I am doing some perf tests and hitting some timeouts on the bucket.GetDocument call. I’m finding that when this happens, the bucket can’t be used again, as it gets a timeout on every subsequent call. I’m using the ClusterHelper class for my cluster and bucket managment. Is this the expected behavior? When I remove the bucket and get a new for the subsequent calls, everything works fine with subsequent calls.
I’m using 2.2.2. All I get from the GetDocument call is my result object with a Success=false and a message of: The operation has timed out. Again, once this occurs once, all subsequent calls using that bucket object timeout. If I reintialize the Cluster or use a new bucket, it will work fine.
You’re not calling Dispose on your buckets or clusters are you? If you’re using ClusterHelper, you shouldn’t use using statements or call Dispose on either of them.
I am having to call the ClusterHelper.RemoveBucket(), which in turn disposes the bucket. I’m having to call this because of the original issue of the bucket timeouts.
Previously you mentioned that you were hitting timeouts during some perf testing. Out of curiosity, during this perf testing where you doing GetDocument or GetDocumentAsync calls. If GetDocument, how many simultaneous threads where you using?
GetDocument, I’ve ramped it up to 180 threads per second, but previously it was as low as 10. I’ve also noticed that if I call ClusterHelper.GetBucket() in my app start-up and don’t actually use the bucket, it will timeout as well. Is that expected?
I really need to get to the bottom of this, otherwise, I’m going to have to abandon the use of Couchbase for my project. Here’s an example of the Cluster disposing after perf test is done, without me removing the bucket or closing the cluster myself.
So, on the Dispose problem I’m thinking it’s related to disposing and reopening the bucket regularly, the SDK just isn’t really designed to handle that use case. So I think we should try to focus on the original issue, which was getting timeout errors running your bulk insert.
Here’s my suspicion as to what your original problem was, however I don’t have enough facts to say for certain. I’m just going to throw out a possible what that a bulk insert would have been written that may have caused your problem.
Startup multiple threads to run the bulk insert
Each thread then runs a series of InsertAsync commands, but doesn’t do an “await” for them to be completed
This actually queues insert commands as fast as the processor can reasonably do so, potentially queueing hundreds or thousands
The SDK tries to send these inserts through the connection pool to your cluster. Under the default configuration, that is a maximum of 10 connections per server. On a 3 node cluster, that means only 30 operations can be pending at once.
Depending on the speed of your cluster, it is quite possible that some of the later operations out of the very large queue in the SDK begin timing out because they are waiting on free ports in the ConnectionPool.
Increasing timeouts, the connection pool size, or server speed would help mitigate this problem, but all that would do is increase the total update threshold before timeouts started occurring. The core problem would still remain.
The approach I would recommend would be to make the inserts run in parallel, but with a concurrency cap. I’d guess maybe double the number of inserts as the total size of your connection pool. That way you’ll be getting the next batch of updates ready while the previous batch is waiting on a response from the server. So if you have 3 servers and 10 connections per server, run 60 tasks, each awaiting the previous insert before another is attempted.
If this wasn’t your implementation, please let me know what the differences are.
In a nutshell, an OperationTimeoutException is a generic exception that is returned when the operation cannot be completed during the timespan defined by ClientConfiguration.DefaultOperationLifespan which defaults to 2.5s. There are many, many reasons why an operation would fail, so in general each failure should be handled independently. That being said, the cause of one OperationTimeout if not resolved could cause future OperationTimeouts…this is likely what you are seeing here. As a rule, the SDK should recover however.
In terms of objects disposing, this is generally tied to your application - the Cluster and Bucket’s you open, must be maintained for the entire application lifespan. The ClusterHelper should help you here, but you still need to ensure that they are not going out of scope or are being closed/disposed by your application. Additionally, I suspect that you may have to do some tuning to match your performance expectations.
I am not sure I understand what you mean here, can you elaborate?
If you dispose of the Cluster, most definitely GetBucket will fail!
In order to help, I would need to see your code. If you can create an NCBC and upload an example project, I’ll take a look at it.
Also, feel free to reach out to me on twitter: @jeffrysmorris
AS i mentioned above, I figured it out. There was a ClusterHelper.Close in an HttpModule ApplicationDispose, when it should have been in the global.asax Application_End().