Somehow the connection is getting closed; this could be on the server side or the client side. If you can post logs that would help.
Note that if you are using the overload of IBucket.Get which takes a IList (the bulk overloads), then you probably don’t want to be doing this within a parallel loop, because internally the client is using TPL; it’s simply too much. If your using regular threads (Thread.Start), then I would probably use one bucket per thread as opposed to sharing a bucket between threads.
I am using batch get. I am not using parallel but Thread.Start.
Assuming that I have 5 buckets and 10 threads, using a different bucket per thread is actually opening 50 buckets on start. 5 sec to open a bucket from localhost and its 4 minutes just to start.
Anyways, that doesn’t make sense as calling get and update should be thread safe AFAIK. Besides, webapps should handle many concurrent ops per sec from different users (= on different threads = many threads).
On top of that, things that worked great before now starting to return with many ClientFailure and OperationTimeout. It happends for single thread, single bucket, single POCO type.
Update1: Batch update with 10 docs usually throws 3-6 docs. Some are saved. Some are not. Failed docs are different every time.
Update2: When batch is ~100s, Bucket.Upsert(items) hangs. It worked perfectly before. From local and remote.
Yes, you have to rebalance after adding, removing or failing over a node. The rebalance will distribute the keys equally across the nodes and update the cluster map and vbucket mappings on the client so that the client can retrieve them.
There is a bug in 2.0.3 and it will be fixed in 2.1.0 that addresses this replica read issue.
A specific doc is accessible from console
JSON is valid
Get() returns ClientFailure with VBucketBelongsToAnotherServer in log4net
I tried again to fail a node and then “Full recovery”. After an hour, it is still not working
now in log4net it also say:
Couchbase.Authentication.SASL.CramMd5Mechanism - Authenticating socket xxx
Couchbase.Authentication.SASL.CramMd5Mechanism - Authentication for socket xxx failed: Auth failure
Couchbase.IO.Strategies.DefaultIOStrategy - Could not authenticate aaa using Couchbase.Authentication.SASL.CramMd5Mechanism - xxx.
I don’t understand the issue with authentication as an adjacent doc can be read successfully.
I’m terrified about using Couchbase on a production system
I can’t be sure exactly what is going on here; probably the best course of action would be to create a NCBC and provide list of steps to reproduce and a sample project.
This doesn’t exactly make sense to me; an NMV is always a server error and never a ClientFailure. ClientFailure’s are errors where the client cannot receive a response from the server or a serialization/deserialization error was raised - the error itself was manifested in the client and not a server response.
With NMV, you should see them propagate to OperationTimedOut if the client cannot resolve the NMV for a Get. Prepend and Append may return an NMV, but this is changing in 2.1.0 (soon to be released).
This can mean two or more things:
a) You provided the wrong credentials
b) The bucket does not exist on the server
c) The bucket was just created and it has not been completely initialized on the cluster (takes a few seconds, so if you are programmatically creating buckets, it may fail in short term unless you delay).
In general, you don’t want to be swapping nodes in and out of a cluster willy-nilly, since it’s a fairly intensive process. Its very easy in CB to do so, but it should be reserved as an operational task when load is not at it’s peak (if possible). I would set up my cluster and then just leave it alone, unless you must do it for some operational reason.
I am not sure; I create, tear down, rebalance etc everyday while developing and load testing and in general things work as expected. Let’s get the NCBC going and take it from the there.
Thanks for your effort to help me solve this issue.
What’s NMV ?
Usually ClientFailure does mean a de-serialization error that I eventually find and fix. In this scenario, perhaps it is possible that the data was not returned, or returned null or corrupted and thus caused serialization error. I can also say that the doc JSON integrity is good and that the doc is accessibly through the console.
This Authentication messages occur only to that specific doc while the same connection easily returns other docs in the same bucket, hence the bucket, connection, credentials, etc. are working properly.
Usually removing a node and then restoring it takes an hour of work from me and about the same downtime for the system. So, no, I don’t want to do it at all. Anyway, it doesn’t always help.
Waiting anxiously for 2.1.0
Also waiting for the next community version
I’m trying to hold in there, but it is very challenging
I thought of another reason that might cause this issue. I’m using 2 web servers concurrently with the cluster, so perhaps there is a collision. Check this
1-Not My VBucket
2-It’s possible; you should be able to trace this back in the logs. I highly doubt the server would be the cause of any null, empty or corrupted data…I haven’t heard of any such thing.
3-The connection to a bucket is authenticated, not the data going across the connection; one doc couldn’t succeed and another fail. Also, the authentication occurs when the connection is created, not when it’s used.
8-You can open literally thousands of clients (var cluster = new Cluster()) to a Couchbase server cluster, however the best practice is to use the bare minimum necessary. Running two web applications (separate processes, thus separate client instances) is fine. Note that each client instance (new Cluster()) and each bucket you open will create pool of TCP connections. This is controlled by the ClientConfigurationn.PoolConfiguration.MaxSize and MinSize properties
If you are using a single Cluster instance (ClusterHelper will ensure this), you shouldn’t run into any issues. Even if you are using two or more Cluster instances per process, it shouldn’t be a problem.
After a lot of time and effort I think that most of my difficulties arose from a collision between the office local network and the cluster’s network in azure connected through VPN. This resulted in nodes being unreachable.
I’m not sure that everything is OK now but be sure that I’ll let you know.
I hope that this post will help others.
Now I need a vacation