Timeout error when querying couchbase

Good day. I´ve experienced the next issue. I have a cluster with two nodes:

  • Ubuntu 14.04 LTS
  • 16GB RAM
  • 6vCPUs
  • 256 SSD Disk

And I have and C# Application which query couchbase using n1ql. After some time, I received the next error:
image

I´ve realized that this error happens because one of the nodes has stopped responding to client requests. So after failing over that node and rebalancing the cluster, couchbase starts to respond successfully again until one of the nodes stops responding for N1QL queries and the issues comes again.

I’ve checked from GUI interface all indexes that I used to support queries and all seem to be OK (All indexes are Ready and 100% built)

There is no error captured into couchbase logs (I’ve reviewed indexer.log, query.log and error.log and nothing strange is logged).

I will appreciate any help anyone can give me to solve this problem.
Thanks

Hi @pablo.calderon,

Are you asking why the nodes are going down? Or are you taking them down yourself to test?

Hi @matthew.groves. Yes I’m asking why nodes stopped to work normally and what the cause could be. As you see in the image, when this error comes out, I can’t query using N1QL, even when I’m querying with the N1QL Workbench. The only solution I could find was to remove the failed node from my cluster and rebalance it so it can respond normally. But after some time, the problem comes up again. Thanks.

Thanks @pablo.calderon1,

What version of Couchbase Server are you using?

I noticed you’re running on Ubuntu. Have you looked at any of these overlooked Linux OS tweaks? https://blog.couchbase.com/often-overlooked-linux-os-tweaks/

I’m using the official version of Couchbase for Ubuntu 14.04: couchbase-server-community_4.5.1-ubuntu14.04_amd64.deb. Checking that ink, I followed those recommendation for my custer. I’ve turned off swappiness and I’ve disabled THP. At some point I started to think that one of these tunings could be the cause of my problems.

Good day everybody. I finally had to start from zero, setting up a new cluster with Debian distro and its stable until now. The question stills opened for anyone who can determinate why this kind of issue happens.