Index mutation keeps getting stuck

In our test environment we keep seeing index mutations becoming stuck, particularly on one particular bucket (ticket_bucket) but as you can see it’s also starting to occur on other buckets:

It’s a single node, couchbase 6.5.1 - autonomous operator running on an Azure kubernetes cluster.

Relevant logs:
logs.zip (3.9 MB)

This is a quiet, low volume system so it’s not a performance issue. It’s the second time we’ve seen it (the first time we saw it, we blew the environment away and started again). Any help will be greatly appreciated.

I deleted the ticket bucket primary index and tried to recreate but got this:
“GSI CreatePrimaryIndex() - cause: Create index or Alter replica cannot proceed due to rebalance in progress, another concurrent create index request, network partition, node failover, indexer failure, or presence of duplicate index name.”

@pc , thanks for sharing the logs. It looks like one of the projectors(the process which forwards the mutations from data service to index service) is stuck and not responding. You can kill the projector process to unblock. Also, if you can share the projector log file, we can check what the issue is with the projector.

2021-06-04T10:25:41.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m33.258388772s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket location_bucket
2021-06-04T10:25:41.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m32.640886572s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket prediction_bucket
2021-06-04T10:25:42.221+00:00 [Warn] Slow/Hung Operation: KVSender::sendMutationTopicRequest did not respond for 68h36m33.047651308s for projector cb-0000.cb.test-tdm.svc:9
999 topic MAINT_STREAM_TOPIC_f55e55d41c45ea1ff7ff9824dad0b3f2 bucket weather_bucket

Thanks for your response.

I’ve killed the projector a couple of times and left it for about 10 mins but it hasn’t made any difference on the outstanding index mutations.

The projector log which includes one of the projector restarts is here:
projector.zip (1.3 MB)

If i kill the whole kubernetes pod and wait for it to come back, it processes all mutations before then becoming stuck again.