High indexer CPU on index server after move to different VMware cluster


#1

We have a 3-node cluster running 4.5.1.-2884. After restoring VM from a failed VMware cluster to new cluster, we are experiencing constant problems with the indexer service consuming high CPU and eventually grinding to a halt. So far our only solution has been to restart the server hosting the indexer service (running Global Indexes) and then it will slowly recur usually within four hours. This configuration had been rock solid stable for over a year before the move.

The servers are provisioned with 6 CPU, 16 GB, and 500 GB drives with are basically 90% free space.

The indexer.log shows a few somewhat generic errors:
couchbase Err projector.topicMissing
status:(error = Index scan timed out)
2018-04-11T09:35:18.788-06:00 [Error] StartDcpFeedOver(): MCResponse status=KEY_ENOENT, opcode=0x89, opaque=0, msg: Not found

What additional information would be helpful to troubleshoot?

Thanks,


#2

Please make sure all the required ports in your new cluster are open. https://developer.couchbase.com/documentation/server/current/install/install-ports.html

The above errors seem to point to configuration issues related to ports not being open. If it doesn’t help, you can share the full log.


#3

The firewalls on all servers are currently all disabled while we are troubleshooting the situation. Should all of these ports be visible on each Couchbase server. It seems that only the one running the indexer role does while the others have tcp port 999 for cluster communication visible.

Since I am a new to the forum, I am unable to upload the indexer.log at this time.

Thanks,


#4

From the index service perspective, all ports listed for “Indexer Service” needs to be open on nodes where index service has been enabled except port 9999 which needs to be enabled on all data service nodes.

Has the memory quota for index service been set correctly on the new cluster?

You can upload the log file using:
curl -X PUT -T indexer.log https://forumlogs.s3-us-west-1.amazonaws.com/indexer.log


#5

I just uploaded indexer.log file.


#6

I don’t see a lot of activity in this log snippet. Do you know know the time window when indexer process was consuming a lot of CPU.

A couple of things you can try:

  1. Change the compaction setting in UI to make it run only on Sunday rather than all the days it is currently set to.
  2. Increase the RAM quota to 2GB.

#7

WE had another failure today, so I’ll get the relevant logs and add them to the case early tomorrow MDT.


#8

I just uploaded yesterday’s indexer.log