Couchbase Indexer Service - High CPU

Team,

We are seeing high CPU (max of 13-14 of 16 CPUs) utilization and very high load average (>25) on few nodes in the cluster which was consumed mainly by INDEXER service.

Node1 : INDEX_1 ( id, time)
Node2: INDEX_2(id, time) → Duplicate/backup index

PEAK load avg: top - 23:03:32 up 4 days, 17:38, 2 users, load average: 57.55, 46.01, 28.72
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6982 couchba+ 20 0 11.2g 1.5g 13000 R 1206 9.9 6560:23 indexer

We are having duplicate index to distribute the load which is happening but Node1 where the original index was present taking high CPU during our Load run and after few hours Node1 is crashing. But requests/sec are same across both the nodes.

Any thoughts/Suggestions would be much appreciated.

Couchbase version : Community Edition 6.0.0 build 1693
Node configuration: 16 core CPU , 16GB memory , SSD.

Output of stats API:- curl -X GET -u USER:PASSWD "http://localhost:9102/api/v1/stats?pretty=true"
Node 1:
“data_size”: -13205983,
“disk_size”: 539983872,
“frag_percent”: 102,
“items_count”: 3742,
“num_docs_indexed”: 28115581,
“num_docs_pending”: 0,
“num_docs_queued”: 39,
“num_requests”: 2547447,
“num_rows_returned”: 312173501,
“recs_in_mem”: 0,
“recs_on_disk”: 0,
“resident_percent”: 0,
“scan_bytes_read”: 20563134922,
“total_scan_duration”: 166302472291019

Node 2:
“data_size”: -12836315,
“disk_size”: 573878272,
“frag_percent”: 102,
“items_count”: 3743,
“num_docs_indexed”: 28121280,
“num_docs_pending”: 1,
“num_docs_queued”: 17,
“num_requests”: 2548545,
“num_rows_returned”: 313383368,
“recs_in_mem”: 0,
“recs_on_disk”: 0,
“resident_percent”: 0,
“scan_bytes_read”: 20628740525,
“total_scan_duration”: 75286299992726

By “Node1 is crashing”, do you mean the index service is crashing or the node itself goes down after high CPU usage? Please check the indexer.log to see if there are any errors or you can collect the logs and share it.

Is there only a single index on Node1? There are only 3742 items in the index.

Just adding to santhoshkc11 comment, index service is not crashing. Instead the CPU utilization in that node goes very high upto 100% and it remains there and causing impact to overall cluster perforamance.

Also, the deletes are very high equivalent to addition. The index fragmentation is reaching to 100% sooner and the CPU utilization gradually increasing upto 100%. We also observed that at exactly zero (0) UTC there is a drop in index data size and disk size which is resulting in CPU percentage drop. And later on the CPU utilization again starts to increase gradually and it reaches 100% before next day UTC and going out of control.

Could someone help here?

@yessara , please check if there is sufficient RAM quota assigned to the index service(UI->Settings).

Also, switch to using circular write mode for index service if not already :
https://docs.couchbase.com/server/6.5/manage/manage-settings/configure-compact-settings.html#configure-auto-compaction-with-the-ui

Thanks for your response. We have allocated 3 GB RAM for index service. We are already using circular write mode. Please let us know if you need more details.

You can check how much is the total index data on the node and allocate 20% of that as RAM quota. Also, if the disk IO is saturated, use SSD disk if not already.

Couchbase EE has a much better storage engine for higher workloads. You can evaluate that as well.

Total index data size did not increase more than few MBs. We have given 3 GB of RAM per server which is sufficient allocation already. We are using SSD already.

Sure, I will try EE version sooner and update.

Meannwhile if there is anything that can help us in community, it would be really great!