CB 4.1 restore is ~5 times slower than 2.5 restore


#1

Hi

I did some tests after 2.5 to 4.1 migration and it turned out that the whole restore process is now at least 5 times slower than it was on 2.5 server.
There is no visible difference with the small buckets with few thousands documents, but the buckets with more than 100 thousands restores much slower.

Generally a backup with ~2 million documents in few buckets was restored in 7 minutes on CB 2.5.
so I removed CB 2.5 and installed 4.1, I used the same memory size on for CB and the same bucket sizes
and restored the same backup as previously - it took 55 minutes. (almost 8 times slower)

I though it might be some kind of conversion when restore 2.5 backup on 4.1 server so after that I created a backup on this new 4.1 server and after recreating all buckets I restored it back on 4.1 server. It took 37minutes (5 times slower)

I repeated this twice plus one time different windows machines, with very similar results.
We have the same issues with our CBs on Linux. The restore time has increased significantly.

so just want to know if this is a bug or a new CB feature?


#2

Very sad to say that, but it’s a “well-known feature”:

  • Using cbbackupmgr Tool
    Couchbase Backup Manager is an enterprise-grade backup and restore utility that is available in the Couchbase Server Enterprise Edition only. Designed for the Enterprise Edition, it replaces the cbbackup and cbrestore tools as the primary and recommended means of backup and restore for Enterprise customers from version 4.5 and above. It enables backup and restore of data, indexes, and bucket configurations at a dramatically increased speed over the previous generation tools.

http://developer.couchbase.com/documentation/server/4.5/backup-restore/backup-restore.html
And, as you may guess, “previous generation tools” are cbbackup/cbrestore (-wrapper) from 4.0/4.1.X
The most sad is that even 4.5-CE release won’t help in this case …


#3

I am using the Enterprise Edition for both 2.5 and 4.1, so it looks like I will have to live with this as with few others CB features :confused:

Thank you for the explanation.


#4

@arkadiusz.zan,
:slight_smile: “Solution” mark looks funny for my answer


#5

@arkadiusz.zan

I’m unsure why the cbrestore from 2.5 is working so much faster for you than the 4.1 cbrestore. It is true that we have added an enterprise version for backup/restore called cbbackupmgr, but there should not be any performance regressions from our current tools. Also, this isn’t an issue I’ve heard about previously from other users so I’m not sure why this is running so slowly for you.

One option is that you can try using cbbackupwrapper and cbrestorewrapper instead of cbbackup/cbrestore. These commands allow running “multi-threaded” versions of cbbackup/cbrestore and should improve performance since cbbackup and cbrestore are only single threaded and always have been.

Also, you can file a ticket at couchbase.com/issues with your server logs and backup logs if you like and I can look at the logs to see if there is anything that stands out. If you do this please run the backup with -vv to get verbose output.


#6

@mikew,
nothing could help cbbackup/cbbackupwrapper “globally”.
For example (4.5.1, 1 node, 2 bukets with 300M of data):

/opt/couchbase/bin$ time ./cbbackupwrapper http://host:8091 ~/cbw1 -v -P 1 -u Administrator -p password

real 7m26.565s
user 4m25.197s
sys 2m30.558s

/opt/couchbase/bin$ time ./cbbackupwrapper http://host:8091 ~/cbw4p -v -P 4 -u Administrator -p password -x dcp_consumer_queue_length=10000,batch_max_size=10000

real 1m32.015s
user 4m34.214s
sys 2m19.162s

/opt/couchbase/bin$ time ./cbbackupwrapper http://host:8091 ~/cbw8p -v -P 8 -u Administrator -p password -x dcp_consumer_queue_length=10000,batch_max_size=10000

real 1m5.663s
user 4m24.007s
sys 1m50.771s

/opt/couchbase/bin$ time ./cbbackup http://host:8091 ~/cb8t -v -t 8 -u Administrator -p password -x dcp_consumer_queue_length=10000,batch_max_size=10000

real 4m46.429s
user 5m5.781s
sys 2m8.419s

Yep, you can increase speed, but you system becomes “almost unresponsive” during backup.
And that’s a 1 min for 300Mb.

5 Mbytes per second, Carl!

What about 10G ? … 30 mins ? 100G ? 5 hours ? 1T ?..(never-ending-stale-data-backup ?)

It would be nice if there were kinda a CLI-tool for “direct-dump-from-memory-to-[…] in the way XDCR does it from one server to another”. Kinda

xdcr_dump http://host:8091 source_nozzles=2 out_nozzles=2 destination=memdump.file border_timestap=date +%s

@mikew, is there something like that ? :wink:


#7

cbbackup actually works in a very similar way to XDCR. Both use DCP in order to stream data out of the server and this is probably the closest thing you can get to just having a direct dump from memory since DCP will stream items from the in-memory cache and avoid going to disk when possible. In your examples, where are you running your cluster and what type of disk are you writing to. I suspect that you are using a disk that has a 5MB/sec max IO throughput. Can you run iostat while the backup is taking place to confirm whether or not this is the case?

Another thing that I’d be interested in knowing is why the system is unresponsive. Is it due to memory, CPU, or disk IO usage?


#8

@mikew,

cbbackup actually works in a very similar way to XDCR

As it always happens, “Devil is in the details”: “Python sucks” ? Buggy implementation ? Code does not utilize new XMEM (v2) capabilities ? As seen from /opt/couchbase/lib/python/cbbackupwrapper

Written by Daniel Owen owend@couchbase.com on 27 June 2014
Version 1.4 Last updated 10 July 2014

[UPDATE] :slight_smile: Hmm, if “Daniel Owen” is the same one as in https://issues.couchbase.com/browse/MB-20943, i think “Buggy implementation” should be removed from this list :wink: @owend, is that you ?

disk throughput definitely is not a problem, even for very small chunks (128-bytes):

dd if=/dev/zero of=/home/user/dd.1 bs=128 count=100000 oflag=nocache,nonblock
100000+0 records in
100000+0 records out
12800000 bytes (13 MB) copied, 0,548441 s, 23,3 MB/s
dd if=/dev/zero of=/home/user/dd.2 bs=128 count=100000 oflag=nocache,nonblock
100000+0 records in
100000+0 records out
12800000 bytes (13 MB) copied, 0,534216 s, 24,0 MB/s
dd if=/dev/zero of=/home/user/dd.3 bs=256 count=100000 oflag=nocache,nonblock
100000+0 records in
100000+0 records out
25600000 bytes (26 MB) copied, 0,606981 s, 42,2 MB/s
dd if=/dev/zero of=/home/user/dd.4 bs=512 count=100000 oflag=nocache,nonblock
100000+0 records in
100000+0 records out
51200000 bytes (51 MB) copied, 0,753886 s, 67,9 MB/s

“Unresponsive” because of CPU on -P 8. In fact, this is due to VCPUs (2 = unresponsive, 16 = works with ~50% load per core)

UPDATE2 ~100% for 2VCPUs (watched via htop during call)

/opt/couchbase/bin$ time ./cbbackupwrapper http://host:8091 ~/cbw8p -v -P 8 -u Administrator -p password -x dcp_consumer_queue_length=10000,batch_max_size=10000

real 1m7.145s
user 1m5.198s
sys 0m14.560s


#9

@mikew

Thank, I will report this once I find some time to repeat these steps with -vv option. and will provide here link to the issue.

BTW
cbbackupwrapper doesn’t work on my 4.1.1-5914 Enterprise windows version, no matter what options I pass I get the same error

“Error: please provide both cluster IP and backup directory path.”

I’ve read manual and command line help and tried many parameters configurations.


#10

@arkadiusz.zan,
try to change directory to couchbase_install_dir/bin and run it from there.


#11

my mistake. Last error message was when I started to play with the options.
The standard message was

Error with backup for running c:\Program^ Files\Couchbase\Server\bin\cbbackup.exe -v -t 1 --vbucket-list=[1000,100 … 2>c:\temp\xx\logs\1000-1023.err

and this 1000-1023.err file content was “Error: please provide both a source and a backup_dir”

of course I ran this from the bin directory but your post gave me some idea. So I copied the bin folder from couchbase_install_dir to different path without “space”

like c:\temp\bin and Voila! it is working. :slight_smile:
so it should be a good practice to not install CB in the path with spaces as it doesn’t play well with them.

I will do some testes with the wrappers tomorrow and let you know.


#12

:slight_smile: try to use quotes, kinda “c:\Program Files\Couchbase\Server\bin\cbbackup.exe” [params]


#13

@egrep

cbbakup.exe is working fine from directory with space in name and without.

cbbakupwrapper.exe doesn’t work from directory with space in name, and this error message is from cbbackupwrapper.exe

anyway, once moved to no-space directory it works fine.

I compared times for standard cbbackup and cbbackupwrapper, for 5 attempts.
CBBackupwrapper does backups in 21 minutes.
CBBackup does backups in 26 minutes

so there is a small progress with wrapper,

Unfortunately I wasn’t able to test cbrestorewrapper as it just hangs at the end

I can see on CB server that all buckets have been restored but the command doesn’t return, I was waiting for 2hours before I killed the process. It hangs every time.

@mikew what kind of server log would you need to analyse this issue?

As soon as I find a new server for testing I will create the ticket. as now I cannot install 2.5.2 on my local machine, facing “Computing space requirements” issue and the workaround with registry trick doesn’t work for me.
It is only 2.5.2 installation issues as 4.1.1 can be installed and uninstalled without any problems.

I am afraid my general feeling about CB is not very positive


#14

@amarantha (if you are the one who made an initial commit with words “a dramatically increased speed over the previous generation tools”, https://github.com/couchbase/docs-cb4/commit/1556a76ca60e43b37a17b7c23fe6b198975a865f ) could you please clarify something about “dramatically increased speed” ?

Or, @anil ( https://github.com/couchbase/docs-cb4/commit/1cc654ea3eae666161145d46afffe5fe3bff2401 ), maybe you could shed some light on this ?

Of course, maybe this is just a “documentation poetry”, “literary hyperbole” or something like that :wink: