If the Couchbase cluster referenced by a bucket is unexpectedly unavailable when an operation is attempted (eg. as a result of a network failure) the SDK raises a code 16 error through the ‘error’ event attached to each Bucket instance. Requests after the connection times out will raise the code 23 timeout error. It appears that even if the Couchbase cluster becomes available again the Bucket connection will not attempt to reconnect if these errors are encountered and there is no way to configure an automatic reconnection in the Node.js SDK in this particular scenario.
I’ve seen several examples where others detect the code 16 and/or 23 error and attempt to recover via a variety of methods. It seems easy enough to call ‘openBucket’ again under these circumstances and substitute the previous bucket instance with with a new instance if the connection is successful. Is this a recommended methodology for reconnecting after a failure? Furthermore, are there any limitations or issues I should be aware of with this approach?
It should reconnect and we test this scenario and many like it regularly. We do use traffic to drive the reconnect though. Is there constant traffic? Can you describe how you simulate the failure?
@brett19 may have other ideas too.
Interesting, now that you’ve said you use traffic to drive the reconnect I’ve modified some of my tests and discovered that with a reasonable request rate it does seem to reconnect automatically, but the more time between requests, the less reliable the reconnect becomes.
This is the script I am using to test: https://gist.github.com/tjdavey/20afd01e866465c652e80c66a2f39bd8
My test environment is as follows:
Client: MacOS 10.10.4, Node.js 4.4.2, Couchbase Node.js SDK 2.2.1
Server MacOS 10.10.4, Couchbase Server 3.0.2-1603
I am disconnecting the network interface the Couchbase host is on to simulate a network failure. I’ve also tried terminating the Couchbase Server process with similar results. If I run this script side-by-side with an interval of 500ms and an interval of 10000ms, the 500ms script will consistently reconnect, the 10000ms script will only reconnect sometimes. I’ve left the scripts running for upwards of 20 minutes and a process that doesn’t reconnect near-immediately never appears to recover.
Is there a certain threshold or limit that needs to be met in order to detect that the cluster is available again. If so, what is that threshold?
Just an update for others seeing similar behaviour:
After escalating this through Couchbase’s support channels an issue was identified in the C SDK, libcouchbase, which resulted in this behaviour. Updating to libcouchbase 2.6.2 solved the issue in our test cases. The Couchbase Node.js SDK 2.2.2 packaged libcouchbase 2.6.2 and has been released.
TL;DR: Upgrade to Couchbase Node.js SDK 2.2.2 or higher to resolve.
Couchbase a desired database compared to MongoDB. But so far there is no realization without the headache of how to reconnect - I can not risk using it in a real application
haven’t used it myself but might be interesting for others (?)