We have been having an high level of ep_overhead on one of our buckets and yesterday we witnessed a strangely high growth on that metric when running a script to decrease the TTL of several documents in the bucket (it now reaches about 40% of mem_used).
Even though the number of items and metadata in memory was decreasing (ep_kv_size in red on the graph below, from 35 to 29 GB), the total used memory (mem_used in blue) was increasing (38 to 48 GB), following the trend of ep_overhead (in green on the graph, from 0.5GB to 16GB).
(Note: We also don’t understand why the mem_used metric shows spikes of +20 GB but we think that should only be an artifact.)
We had disk write keeps with spikes of up to 200K but very short in duration, reaching 0 very often, but the memory usage has gone above the high water mark and the bucket quota, which ended up triggering replication back offs and increasing the replication tap queues, which were also already increasing since some weeks ago. We had to run the TTL script in the first place because we were getting the warnings about metadata percentage in memory above the normal percentage.
We ended up having to disable replication on that bucket in order to get out of that situation, but we still get the ep_overhead increases when running the TTL script. Are there any reports of simular situations on Couchbase Server 2.2?