Cbbackup - memory usage

korekontrol · February 5, 2015, 11:21am

I’ve setup XDCR of production cluster to backup node and use cbbackup to dump all buckets on a daily basis. Couchbase data directory is around 55GB. When I run cbbackup, it goes crazy in terms of memory usage. It’s using up to 60GB of memory - which is beyond machine RAM (it has 16GB, out of that 7GB is used by couchbase itself) so it ends up using SWAP excessively. This all makes the backup process very very slow, around 20hours. Disk I/O (on the data partition) and CPU usage is pretty small, definitely memory usage (and because of swapping - IO on swap drive) being the bottleneck.
Is there any way which we could reduce memory usage by cbbackup?

Looking on other datastores I’ve been using before (*SQL, MongoDB) it was never a problem to backup 50GB of data…

I’m running on Amazon EC2 - the backup host is m3.xlarge instance, which provides two 40GB ephemeral drives. I’m using raid0 of them for swap volume.

Available on gist - output of dstat command. xvdb, xcdc are ephemeral local SSD drives used for swap, xvdf is used for couchbase data and writing backup outputs (with IO capacity of 1500IOPS):

gist.github.com

https://gist.github.com/marek-obuchowicz/bdaa42278cf784f108bf

gistfile1.txt

 #   dstat -cmdr -D xvdb,xvdc,xvdf 5
        ----total-cpu-usage---- ------memory-usage----- --dsk/xvdb----dsk/xvdc----dsk/xvdf- --io/xvdb-----io/xvdc-----io/xvdf--
        usr sys idl wai hiq siq| used  buff  cach  free| read  writ: read  writ: read  writ| read  writ: read  writ: read  writ
         16  10  60  13   0   0|14.6G 2132k 8100k 76.0M|3686k 3336k:3686k 3335k:4018k 4175k| 733   613 : 733   612 : 285   261 
         15  24  38  22   0   0|14.6G 2512k 7532k 80.0M|  12M 4606k:  12M 4758k:3254k 3612k|2342   724 :2379   735 : 574   356 
         21  31  23  25   0   0|14.6G 2312k 8668k 76.2M|  11M   14M:  10M   13M:2009k 3666k|2388  2290 :2315  2256 : 381   292 
         14  29  34  23   0   0|14.6G 1932k 7108k 73.8M|  10M   12M:  10M   11M:2386k 2957k|2324  1678 :2342  1665 : 449   296

emccormick · March 10, 2015, 6:38am

I am experiencing this same issue using version 3.0.1 Community on Centos6.5 in Rackspace. Python version “python.x86_64 2.6.6-52.el6”. I run this on one of the couch base nodes with the --bucket option to backup a specific bucket from all nodes’ data (not single node mode).

The cbbackup utility just uses all available memory until the VM’s kernel OOMkills it, as well as memcached.

Tasks: 242 total,   1 running, 241 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.7%us,  5.2%sy,  0.0%ni, 86.1%id,  0.8%wa,  0.0%hi,  0.0%si,  1.2%st
Mem:  30822556k total, 28809180k used,  2013376k free,   114604k buffers
Swap:        0k total,        0k used,        0k free,  5510528k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
25979 couchbas  20   0 16.9g  16g 6468 S 14.6 56.4  17:07.67 memcached
26840 root      20   0 5474m 4.6g 8012 S 200.8 15.7  75:39.92 python
14072 couchbas  20   0 2148m 331m 1964 S  7.0  1.1 939:51.42 beam.smp
14043 couchbas  20   0 1275m  23m 1232 S  0.3  0.1  18:50.06 beam.smp

Mar 9 20:44:10 couchdbwhois1113r kernel: Out of memory: Kill process 14119 (memcached) score 567 or sacrifice child
Mar 9 20:44:10 couchdbwhois1113r kernel: Killed process 14119, UID 497, (memcached) total-vm:17794644kB, anon-rss:17453500kB, file-rss:8kB
Mar 9 20:44:10 couchdbwhois1113r kernel: Out of memory: Kill process 4630 (python) score 320 or sacrifice child
Mar 9 20:44:10 couchdbwhois1113r kernel: Killed process 4630, UID 0, (python) total-vm:11581564kB, anon-rss:10773124kB, filers:4kB

asingh · March 10, 2015, 6:54am

Sorry to hear that you’re encountering OOM for memcached process. Will be great if you could share cbcollect_info log from the node when OOM killer kicked in.

korekontrol · June 24, 2015, 2:41pm

I don’t think that it has anything to do with couchbase server itself. As it happend to both me and emccormick - the problem is that cbbackup process starts to consume all available memory, which in end-effect causes out of memory situation, which is normal and triggers OOM-killer.

From my experiments, I have figured out that cbbackup loads into memory the whole contents of a bucket when dumping its contents. On large clusters it just renders cbbackup useless tool, which is very pity because it’s impossible to dump the data. I have been suggested other solutions (like XDCR to another cluster for backup and do volume snapshots), but it’s not perfect for many reasons - and we would love to be able to use cbbackup tool anyway!

asingh · June 24, 2015, 3:40pm

@korekontrol please refer to our release notes for v3.0.1, where we have documented steps to mitigate “hard OOM” on server side - Release notes v3.x

New option --sequential allow you to control the amount of vbucket backfills at any given time, hence it mitigates “hard OOMs” but it’s bit slow.

korekontrol · June 24, 2015, 4:24pm

@asingh, thank you for this note. We’ll try it next time.