Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout


#1

We are seeing a lot of these errors in our rails application using couchbase ruby client(v 1.3.11) to talk to our test cluster(v3.0.1, 3 nodes, total of 72Gigs of Ram quota and barely doing 300 to 400 ops/sec). What would be a good way to figure out the underlying cause ?


Bucket.set() with "observe" option always results in Couchbase::Error::Timeout: the observe request was timed out
#2

I’m having a similar issue. Running couchbase 3.0.2. I have 6 servers running on cloud base virtual machines split across two hosting companies for redundancy. Each has 4GB of RAM and 2 or 4 core (on hosting company provides more core at the 4GB level than the other). Really light duty use so far, maybe 10 req per sec on each server. CPU usage stays below 50% and load average below 3.
Because of the lite load we are still running the front and back end processing on those servers.
The point of the extra info is that we get the timeout errors only on the front end while the back end does many times the number of requests/updates. Also it usually does not happen on more than one host at a time.
The front end is not rails though it does use some of the same building blocks. Front end consists of Apache -> Passenger -> Sinatra
The backend is not running on anything but just Ruby.
We kept the front end as simple as possible to reduce the response times. So it only does 2 DB requests per service request. Get an account config and decrement a balance counter both by doc id not from a view. Very light weight.
The request to us are very steady, but the timeouts happen in bursts at uneven intervals. Makes me want to think it is tied to indexing but haven’t been able to correlate that hypothesis.
Because of the timeouts, I put in a cache of the account’s config that lasts for 10 sec. Drastic reduction of lookups but still get the timeout errors.
When there is a timeout I have it rescue it and have it retry up to 20 times which it has made it to a few times. Each retry has a sleep before retrying and the sleep period is increased each retry to give the DB some backoff time.
I’m pretty sure that Passenger is not running threads. And my front end code is not threaded.
Here is a list of the public gems I use in the front and backends:
rubygems, json, bunny, couchbase, thread, uuidtools, digest, time, sinatra/hashfix
Hope this helps in solving for both of us.
Any hints would be greatly appreciated!!
Thanks


#3

Could you tell your libcouchbase version? If it is possible make sure you have the latest one


#4

Current version is
couchbase (1.3.11)
I see that 1.3.12 is now available. Will update.
Unfortunately, I will probably not be able to see if that makes a difference until Tue next week when next batch will probably be sent. I haven’t noticed the timeouts when doing load testing but wasn’t looking for them at the time.
I’ll see if I can reproduce with load test.


#5

I was asking about libcouchbase, the C library which doing most of the networking.

If you are using fedora for example, you can see it like this:

$ yum list installed | grep libcouchbase
libcouchbase-devel.x86_64            2.4.9-1.el6                         @System
libcouchbase2-bin.x86_64             2.4.9-1.el6                         @System
libcouchbase2-core.x86_64            2.4.9-1.el6                         @System

#6

We are on v2.4.8. I see that the latest version now is 2.4.9 .

During these timeouts, the disk write queue tends to go upto about 100K for us. What does that indicate ? Nothing else seems to stand out in the web console graphs.


#7
libcouchbase-devel.x86_64   2.4.7-1.el6  @couchbase
libcouchbase2-core.x86_64   2.4.7-1.el6  @couchbase

Are those used by couchbase or the client?
Does couchbase need to be restarted after updating them?

Thanks!!


#8

libcouchbase used only by ruby client, so after update, you have to restart your application only.


#9

Upgrading libcouchbase has not helped. I am easily able to reproduce with my load testing. I tested before upgrade and after upgrade. No change in timeout errors.
current versions
# rpm -qa | grep couchbase
couchbase-server-3.0.3-1716.x86_64
libcouchbase2-core-2.4.9-1.el6.x86_64
libcouchbase-devel-2.4.9-1.el6.x86_64

After upgrading the libs, I upgraded to couchbase-1.3.12.gem. Again restarted my apps and again no improvements with the timeouts.

The two errors I get are, the first for decrement and the second for a simple get. These operations are repeated on the same two docs many times a second on all 6 servers.

Couchbase::Error::Timeout Message: failed to perform arithmetic operation, Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout

Couchbase::Error::Timeout Message: failed to get value, Client-Side timeout exceeded for operation. Inspect
 network conditions or increase the timeout

#10

We’ve tried hitting our test cluster from java / python / ruby based client side simulation scripts and we are seeing these timeouts in all of them . It tends to coincide with growth in the disk write queue(but not always). Are SSDs/RAID essential for couchbase to perform well ? Is it normal for a 3 node, 72 GB RAM allocated cluster not being able to take a couple of hundreds of ops/sec ?


#11

I moved the decr call out of the frontend, under passenger and sinatra, to the backend, a simple ruby daemon not under some larger framework. I have only seen 1 timeout on the front end and none on the backend since the move.
The backend already was already doing most of the work with many more couchbase reads and writes for each request than the front end.
So there seems to be some problem with the combination of couchbase, passenger, and sinatra.
My attentions have to be focussed on other issues now and I don’t know when or if I will come back to this issue.
Hopefully, this will be helpful to someone to find and fix this bug/compatibility problem.


#12

Yes, that probably will help. I will build larger sandbox to test it