Strange behavior in clustered environment while creating new index

nidhks · June 4, 2018, 1:33pm

Hi All,

We met with a strange issue in our production environment while trying to create a new index which ended up in multiple nodes going down and instance becoming inaccessible through ssh.

Environment:
Couchbase 4.5.1
3 node clustered environment
Number of buckets : 7
Out of which 6 buckets are having only few data - less 50K

1 bucket with more than 2M data and having 6 indexes on it.
We tried to build a new index on the same bucket and within minutes a node went down and subsequently instance became inaccessible.

Iam attaching cb_collectinfo dump here.

CB_Info-dump

After 4-5 hours and multiple start-stop of instance everything became normal.
Also we tried to remove that new index in between.

Please help us finding any clue whats happening and how we can avoid the same in future.

Any help would be greatly appreciated.

Thanks ,
Nidheesh

deepkaran.salooja · June 5, 2018, 12:39am

@nidhks, the system is running out of memory. Indexer process crashed as there was not enough memory to allocate from. There are multiple services on each node (data, query, index) and you have views defined as well(there are some views related errors as well in the logs). If you need all of these services, try adding more nodes and move index service to a dedicate node. You may want to consider getting rid of views to improve cluster stability. Lot of the view functionality can be achieved via query/index.

nidhks · June 5, 2018, 6:20am

@deepkaran.salooja ,

Thanks a lot for your quick reply.

We were able to reproduce the same issue in our test env using cb_dump from prod.
In test environment we tried increasing Index RAM Quota from 1GB to 2.8GB but still same issue exists.

Can you please let me know how we could know the amount of RAM required.
Also is there any difference between Index RAM Quota and the RAM used by index service(indexer) to perform index creations.

deepkaran.salooja · June 5, 2018, 8:31pm

The problem is not really with the index memory quota. As the system runs out of memory, the memory allocation within the indexer fails. You will need to decongest the system by moving some of the services to additional nodes.

The query, views, index, data service are all taking roughly similar amount of memory. It would be a good idea to move index/query to a separate node and try it out with a 2-3GB index memory quota and increase it if required.

Also is there any difference between Index RAM Quota and the RAM used by index service(indexer) to perform index creations.

No, index service only uses one quota to manage existing as well as new indexes.

nidhks · June 6, 2018, 6:07am

@deepkaran.salooja, Thanks for the update. We will try upgrading RAM and share the result once its done.

In between we met a situation where n1ql queries were returning empty results without any error when the indexes were not working properly.
Is there any way we can differentiate between normal empty results and empty result when indexer is not working properly?

deepkaran.salooja · June 6, 2018, 11:52pm

You could use REQUEST_PLUS consistency level for the queries. The queries would then timeout.