Node get's killed by OOM killer due to indexer memory exceeded

query
n1ql

#1

Hello,

I have 4 Couchbase nodes in version 5.0.1-5003 community edition with the following configuration :

  • 32 Gb memory
  • 3 buckets with 2 Replicas
  • Data service : 19Gb
  • Index Service : 6Gb

Since we start using N1QL, we are having a very serious issue : Couchbase nodes get killed by OOM killer because the indexer and/or cbq goes far beyond the size of the configured memory.

OOM Killer

Jun  7 07:46:50 node2 kernel: [243379.269217] Out of memory: Kill process 9185 (memcached) score 393 or sacrifice child
Jun  7 07:46:50 node2 kernel: [243379.270200] Killed process 9185 (memcached) total-vm:21640716kB, anon-rss:10934144kB, file-rss:0kB, shmem-rss:4kB

Node 2

1379536 19952 /opt/couchbase/lib/erlang/erts-5.10.4.0.0.1/bin/beam.smp
1440156 219200 /opt/couchbase/bin/projector
1828192 67160 /tools/dbadm/gdat/jre/Linux_x86_64/bin/java
2506496 659796 /opt/couchbase/lib/erlang/erts-5.10.4.0.0.1/bin/beam.smp
7344692 607204 /opt/couchbase/lib/erlang/erts-5.10.4.0.0.1/bin/beam.smp
12111440 4324436 /opt/couchbase/bin/indexer
20407828 15425296 /opt/couchbase/bin/memcached

Node 3

1380052 11216 /opt/couchbase/lib/erlang/erts-5.10.4.0.0.1/bin/beam.smp
1828192 70644 /tools/dbadm/gdat/jre/Linux_x86_64/bin/java
2310208 591900 /opt/couchbase/lib/erlang/erts-5.10.4.0.0.1/bin/beam.smp
2603532 475052 /opt/couchbase/bin/projector
7241612 418580 /opt/couchbase/lib/erlang/erts-5.10.4.0.0.1/bin/beam.smp
15923680 720376 /opt/couchbase/bin/cbq-engine
21214732 15679948 /opt/couchbase/bin/memcached
29247132 5763240 /opt/couchbase/bin/indexer

We were using N1QL indexes before, but they were simple ones, and not so much used. The new index that seems to trigger the overconsumption of memory is the one in this post

crash logs (122.8 KB)


#2

This seems to match this issue MB-20178, but it was supposed to be fixed in version 4.5.1


#3

Hello @tchlyah,
Can you please give details about:
Number of documents?
Documents size (avg size)?
Working set residency required(eg: 80% of data needs to be resident)?

Typically if the cluster is under sized and when cluster is under memory pressure, OS will invoke OOM killer and in Couchbase case, memcached is overarching bad boy, and get killed.


#4

We have 3 buckets, the main buckets where we do N1QL requests has :

  • ~9 millions documents
  • Average size : 4kb size
  • 50K documents with 100kb

The 2 other ones contains each ~4 millions documents with same average size

Every day we reload every data from CB, so we need 100% of resident memory, which is what CB shows in production.

In my perspective, we do not have a very large base, and before the new N1QL requests, we didn’t have any issues!


#5

Thanks for giving details about the setup. I appreciate it.

Yes, # of documents are not big in this case.

I see that you are using N1QL/Query service. By any chance you have primary index on your index nodes?
Is Index service running separately on its node, or its shared with other service?
Are your current SLA’s being met? And what are they?

We don’t recommend using primary index(s) on production clusters.


#6

No we don’t have any primary index! All our requests use indexes specially created for them.

No, unfortunately we do not have entreprise edition yet, and I can’t do anything about it for now. So we can’t separate index/query services from data one.

Until now we didn’t have any issues with Couchbase, our SLA is being met.

For me it should be linked to the newly created index (Array Covering index with UNNEST and condition), maybe it is too much complicated ?

I can’t go to production with these issues!