Error retrieving list of buckets, contacting query service returned status: 500

we are frequently getting the following error
“Error retrieving list of buckets, contacting query service returned status: 500”
and we also see the following in the indexer.log file
"2019-12-24T05:15:08.963+05:30 [Info] DATP[->dataport “:9105”] DATP -> Indexer 99.697092% blocked ".
This blocks us from querying the data as well.
The only option was to restart the service , delete the index and rebuild index.
This is blocking our entire application.

Couchbase: 6
OS: ubuntu 16.03
datasize 2-3 tb (200 million records)

Quick help is appreciated ! Thank you

@sarat641 If you are using Couchbase Enterprise Edition and have a support contract, could you please reach out to Couchbase Support? That will get very quick attention to your issue.
If you do not have a support contract, could you please share cbcollect from your cluster?

We are evaluating Couchbase community edition.
cbcollect_info size around 400mb size , can you tell me any particular file , i will share the same.

Hi @sarat641,

Will you please provide more information regarding following so that we can guide you better.

  1. Number of nodes in the cluster.
  2. Which nodes are running which services.
  3. CPU and memory consumption on nodes. Especially, which couchbase process is consuming how much resources.
  4. Can you elaborate your point when you said “This blocks us from querying the data as well.”. Does it mean query is timing out? Or you are seeing service unavailability.
  5. Please attach indexer.log, projector.log and query.log files.

Regarding the following log message:
2019-12-24T05:15:08.963+05:30 [Info] DATP[->dataport “:9105”] DATP -> Indexer 99.697092% blocked

one of the possible reasons for these messages is index storage being slow. Index storage backend for Couchbase community edition has some performance and scalability limitations. With enterprise edition, you should get better performance.

Having said this, if you are seeing CPU and memory saturation on the nodes, you can try provisioning more nodes in the cluster to distribute the database workload better.

  1. Number of nodes in the cluster.
    6 node clusters each node has 128 gb RAM, 16 core CPU, 2tb ssd
  2. Which nodes are running which services.
    all nodes are running query, index and data services (community edition)
  3. CPU and memory consumption on nodes. Especially, which couchbase process is consuming how much resources.
    Except one all 5 nodes are taking less than 50% CPU and RAM utilization
  4. Can you elaborate your point when you said “This blocks us from querying the data as well.”. Does it mean query is timing out? Or you are seeing service unavailability.
    Query service in unavailable frequently. get an error unable to retrieve list of buckets.
  5. Please attach indexer.log, projector.log and query.log files.indexer.14.zip (3.6 MB) projector.14.zip (1.9 MB) query.14.zip (1.6 MB)

Hi @sarat641,

I don’t know why there was error in retrieving list of bucket. I don’t see any logging in query.log regarding this. Attached indexer.log does not have indexer blocked messages. May be the problem got resolved due to index deletion and rebuild. I can see a lot of “Index scan timed out” errors in query logs.

Looking at the index stats, it looks like a lot of documents are queued on the indexer side, but are not yet indexed. Average drain rate is zero. A very few (less than 20) items are getting indexed (flushed to index storage) every minute. At this time, there are around 168 million docs already indexed.

On this node, do you see CPU or disk i/o saturation? If yes, then that can be the bottleneck. Please check which process is consuming most of the CPU and disk i/o. Also, if CPU is under pressure and indexer process is taking most of the CPU, you can collect cpu profile of indexer using following command. Also let me know couchbase server version (including minor version).

curl -X GET -u <couchbase-admin-username>:<password> http://<host-ip>:9102/debug/pprof/profile

Meanwhile, I will check if I can find any numbers around scalability of the indexes in community edition.