Couchbase nodes' RAM is getting full frequently

viru243 · September 30, 2015, 4:33am

we have a 4 node cluster, with 24 GB RAM, out of which 18GB has been given to couchbase with zero replicaion. We have a approx 10M of records in this cluster with ~2.5M/hour and expire old items. My RAM Usage which is ~72GB is getting full every ~12 days, and i need to restart the cluster to fix this. After restart again the RAM usage is back to ~20GB.

Can someone please help to understand the reason for it.

FYI : Auto Compaction is set to 40% fragment level and Meta Data Purge Interval is set to 1 Day, – which we reduced to do 2 hours. But it didn’t help.

drigby · September 30, 2015, 10:39am

What do you mean by “full”? Couchbase by design will attempt to use as much of the Server Quota as you allocate to it for caching recently-accessed data. You mention you have set an 18GB Server Quota, and 4x that is indeed 72GB.

How much of the Server Quota have you allocated to Buckets? If you’ve allocated all of it, then Couchbase using 72GB to cache data is exactly expected.

Note also that there will be some additional usage over the Server Quota for general cluster management, but 6GB (24GB-18GB) is probably sufficient in most cases.

viru243 · October 2, 2015, 5:57pm

We have created a bucket of 70GB out of 72GB RAM. So here “full” means all 70GB of memory is getting full. While the disk data size remains the ~22 GB for the same cluster.

drigby · October 2, 2015, 6:40pm

You sure that’s Couchbase data and not OS / disk caches etc? What’s the RSS of the memcached process , and what does free -m report?

viru243 · October 2, 2015, 6:53pm

Currently we have sufficeint free mem avaliable — as we restarted the node 4 days back…
free -m
total used free shared buffers cached
Mem: 23934 15908 8026 0 227 4282
-/+ buffers/cache: 11398 12536
Swap: 4095 12 4083

ps -o rss,vsz 19419
RSS VSZ
10466268 12859580

the main issue is the difference between the two metrics of couchbase, and we don’t have any replica.

vb_active_itm_memory --> 3G
ep_kv_size —> 10.5G
In few days, this ep_kv_size will reach to 15 G, and then node will die…

viru243 · October 3, 2015, 3:36am

@drigby, can you please explain what does below mean in cbstats allocator output…

MALLOC: 13211296632 ( 12599.5 MiB) Bytes in use by application
It looks like most of the memory usage by this only,…

drigby · October 3, 2015, 11:41am

You need to look at that in the context of the whole report - if you can paste the output of cbstats allocator here I can take a look.

viru243 · October 3, 2015, 5:07pm

Below is the complete report of cbstats allocator

------------------------------------------------
MALLOC:    11324902704 (10800.3 MiB) Bytes in use by application
MALLOC: +   1119019008 ( 1067.2 MiB) Bytes in page heap freelist
MALLOC: +    292244776 (  278.7 MiB) Bytes in central cache freelist
MALLOC: +      8535392 (    8.1 MiB) Bytes in transfer cache freelist
MALLOC: +     28149832 (   26.8 MiB) Bytes in thread cache freelists
MALLOC: +     29814944 (   28.4 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  12802666656 (12209.6 MiB) Actual memory used (physical + swap)
MALLOC: +     60792832 (   58.0 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  12863459488 (12267.6 MiB) Virtual address space used
MALLOC:
MALLOC:         234654              Spans in use
MALLOC:             16              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
------------------------------------------------
Total size of freelists for per-thread caches,
transfer cache, and central cache, by size class
------------------------------------------------
class   1 [        8 bytes ] :    13090 objs;   0.1 MiB;   0.1 cum MiB
class   2 [       16 bytes ] :    44695 objs;   0.7 MiB;   0.8 cum MiB
class   3 [       32 bytes ] :   219591 objs;   6.7 MiB;   7.5 cum MiB
class   4 [       48 bytes ] :    12820 objs;   0.6 MiB;   8.1 cum MiB
class   5 [       64 bytes ] :    18694 objs;   1.1 MiB;   9.2 cum MiB
class   6 [       80 bytes ] :   119907 objs;   9.1 MiB;  18.4 cum MiB
class   7 [       96 bytes ] :    93802 objs;   8.6 MiB;  26.9 cum MiB
class   8 [      112 bytes ] :    65668 objs;   7.0 MiB;  34.0 cum MiB
class   9 [      128 bytes ] :   171914 objs;  21.0 MiB;  54.9 cum MiB
class  10 [      144 bytes ] :   119318 objs;  16.4 MiB;  71.3 cum MiB
...
------------------------------------------------
PageHeap: 48 sizes; 1067.2 MiB free;   58.0 MiB unmapped
------------------------------------------------
     1 pages *   8505 spans ~   66.4 MiB;   66.4 MiB cum; unmapped:   54.4 MiB;   54.4 MiB cum
     2 pages *   3254 spans ~   50.8 MiB;  117.3 MiB cum; unmapped:    0.0 MiB;   54.4 MiB cum
     3 pages *   1394 spans ~   32.7 MiB;  150.0 MiB cum; unmapped:    0.1 MiB;   54.6 MiB cum
     4 pages *   8303 spans ~  259.5 MiB;  409.4 MiB cum; unmapped:    0.1 MiB;   54.6 MiB cum
...
>255   large *      4 spans ~   14.1 MiB; 1125.2 MiB cum; unmapped:    0.0 MiB;   58.0 MiB cum

The first line contains the ~10GB of the size…

drigby · October 5, 2015, 2:29pm

[quote=“viru243, post:8, topic:5308”]

MALLOC:    11324902704 (10800.3 MiB) Bytes in use by application
MALLOC: +   1119019008 ( 1067.2 MiB) Bytes in page heap freelist
MALLOC: +    292244776 (  278.7 MiB) Bytes in central cache freelist
MALLOC: +      8535392 (    8.1 MiB) Bytes in transfer cache freelist
MALLOC: +     28149832 (   26.8 MiB) Bytes in thread cache freelists
MALLOC: +     29814944 (   28.4 MiB) Bytes in malloc metadata
[/quote]


Ok, so this means that Couchbase's kv-engine ( `memcached`) on that one node has requested 10.54GB, and adding on memory allocator overhead it's actually using 11.97GB of RAM. That's not an uncommon / unusual amount of overhead with TCMalloc. 

I assume these are from when your cluster is "good" - can you re-post when you're actually seeing the unexpectedly high usage.

Additionally you might want to start testing 4.0 RC - this has a different memory allocator (jemalloc) and defragmenter (on Linux) which should reduce the memory allocator overheads.

drigby · October 6, 2015, 8:23pm

Note - 4.0 is now GA if you want to see if your workload performs better with it.