Timeouts on a couchbase cluster using gevent on simple get calls


#1

I’m using the latest python SDK (1.2) with gevent and I’m getting timeout errors on simple get operations. The cluster has 3 couchbase nodes hosted on Amazon and the client runs locally on one of them. When the client is restarted the errors no longer appear.
The exception I get is:

Traceback (most recent call last):
  File "/home/cep/cep/wamp/cra.py", line 67, in rpc_call
    result = self.procedures.call(uri, args)
  File "/usr/lib/python2.7/site-packages/geventwebsocket/protocols/wamp.py", line 62, in call
    return proc[1](*args)
  File "/home/cep/cep/users.py", line 180, in linkFacebookByAccessToken
    user = User(response['email'])
  File "/home/cep/cep/couchbasekit/document.py", line 55, in __init__
    if self._fetch_data(get_lock) is False:
  File "/home/cep/cep/couchbasekit/document.py", line 172, in _fetch_data
    result = self.bucket.get(self.doc_id)
  File "/usr/lib64/python2.7/site-packages/gcouchbase/connection.py", line 106, in ret
    return self._waitwrap(meth(self, *args, **kwargs))
  File "/usr/lib64/python2.7/site-packages/gcouchbase/connection.py", line 102, in _waitwrap
    return get_hub().switch()
  File "/usr/lib64/python2.7/site-packages/gevent/hub.py", line 331, in switch
    return greenlet.switch(self)
TimeoutError: 

#2

Hi

How often are you getting these timeouts and how long does it take to time out? If you use the same code without gevent, how does it function?


#3

Hi Mark,

I get these timeouts almost once every 24hrs. Once the client goes into this state every db operation timeouts. If I restart the client seems to be stable again but only until the next incident. Just a small note. We are using m1.medium Amazon instances (I do not know if it has to do with couchbase minimum requirements). Haven’t being able to test without gevent yet.


#4

I can’t say I know yet what might be the cause of this issue, so we’ll need to narrow down the triggers for this:

(1) If you restart the server instead of the client, how does it respond?
(2) Is there any notable correlation between memory and/or CPU usage once you get these timeouts
(3) Do you notice any particular delay when the timeout takes place? (instant? 2.5 seconds? etc.)


#5

Hi Mark,

I’m still waiting for the next incident to happen in order to provide more feedback. This has not happened again. One thing I can tell you for sure is that the CPU usage is pretty low. I’ll keep you posted.

Thanks