Indexes fail and must be recreated

Hi we are running Couchbase Community Edition 5.1.1 and are seeing issues with the Indexes failing and needing to be rebuilt.

In a number of cases over a number of different machines our application will fails in a variety ways because queries that are expected to work return no data.

If we drop all of the indexes and recreate them then the problem is fixed. But we really want to know why this is happening and to prevent it from happening to start with.

Are there any known issues that might be contributing to this problem?

We have had another occurrence of this problem and I’ve been able to find some potentially relevant errors in the Couchbase log files. The key snippet of error log seems to be;

Service ‘indexer’ exited with status 2. Restarting. Messages:
'Error status code: -1, Error in REMOVE on a database file ‘c:\Program Files\Couchbase\Server\var\lib\couchbase\data@2i\kakapo-bird_idx_record_chips_birdID_5621171458353600489_0.index\data.fdb.0’, errno = 32: 'The process cannot access the file because it is being used by another process.

Is this a “normal” error message or does it indicate the problem we are having with indexes failing?

Google found this issue in the bug database: https://issues.couchbase.com/browse/MB-17215

That seems to indicate that this sort of an error is a major, but the bug report says that it was fixed in 4.5, and we’re using 5.1.1.

Hi @richard.perfect, It would help us to investigate these failures if you can provide the following information-

  1. OS Version
  2. How many nodes in the cluster do you have, and what is the layout of services
  3. What kind of operations were going on when you hit this issue? Were the indexes being deleted ?
  4. How many documents are there in the bucket on which the above index is built?
  5. Can you upload logs ?
  1. The OS is multiple various flavours of Windows, Win 10, Server 2016, and also Windows 7. Each of these different types of machine have had examples of the problem.

  2. The application runs on one node at a time, but we make extensive use of Sync Gateway replication to replicate between the single database node using SG to SG replication.

  3. The key triggering effect seems to be something to do with Hibernation or Shutdown/Restart. Many of the machines are Laptops which are regularly hibernated by shutting their lids and the problem seems to appear after having done this. It was working fine yesterday, you come in restart or wake-up the machine and it doesn’t work. But we’ve also had instances on Windows 2016 server VMs where the server is restarted and then the problem occurs.

  4. I don’t think the indexes are being deleted or rebuilt by the application specifically, though there may be some background level of activity occurring within the application during these events.

  5. The application has about 480,000 documents and 52 indexes.

  6. If you email me at richard.perfect “at” fronde.com I can send you a Google Drive link to the log files (they’re about 100mb zipped)

Thanks @richard.perfect. I have sent you an email. Pls send me the Google drive link to the log files.

Regards
Mihir Kamdar

Hi Mihir, - I sent through a link to the log files a few days ago. Were you able to download them ok?

Hi @richard.perfect, I was able to download the logs fine. I have gone through them and I do see the errors you mentioned. Unfortunately, the indexer logs have rotated, and I cannot see the logs for the period where the indexer service was restarted. It would be great if you can take a cbcollect and share the logs if the issue is seen again. In the meanwhile, I am also trying to reproduce the issue in house. I will keep you posted. Thanks.

Regards
Mihir