Cbbackupmgr merge does not complete (no errors)


#1

Hi,

I am testing out cbbackupmgr, and I’m running into “random” issues. I don’t have the logs for the previous ones, but here is one that I’m stuck on right now. When I issue a cbbackupmgr merge (along with the other parameters), the merge does it job but eventually get stuck:




2016-07-28T10:16:32.037-07:00 Merging data from 2016-07-28T10_10_12.4825425-07_00
2016-07-28T10:16:32.037-07:00 (Plan) Executing transfer plan
2016-07-28T10:16:32.037-07:00 (Plan) Transfering bucket configuration for CIR
2016-07-28T10:16:32.038-07:00 (Plan) Transfering views definitions for CIR
2016-07-28T10:16:32.038-07:00 (Plan) Transfering GSI index definitions for CIR
2016-07-28T10:16:32.039-07:00 (Plan) Transfering full text index definitions for CIR
2016-07-28T10:16:32.040-07:00 (Plan) Deciding which key value data to transfer for CIR
2016-07-28T10:16:32.160-07:00 (Plan) Transfering new key value data for CIR
2016-07-28T10:16:33.009-07:00 (Plan) Data backup completed after 825.0869ms
2016-07-28T10:16:33.010-07:00 (Plan) Transfer of all data complete
2016-07-28T10:16:33.010-07:00 (Plan) Transfer plan finished successfully
2016-07-28T10:16:33.090-07:00 Merging data from 2016-07-28T10_10_37.7169631-07_00
2016-07-28T10:16:33.090-07:00 (Plan) Executing transfer plan
2016-07-28T10:16:33.090-07:00 (Plan) Transfering bucket configuration for CIR
2016-07-28T10:16:33.091-07:00 (Plan) Transfering views definitions for CIR
2016-07-28T10:16:33.092-07:00 (Plan) Transfering GSI index definitions for CIR
2016-07-28T10:16:33.094-07:00 (Plan) Transfering full text index definitions for CIR
2016-07-28T10:16:33.095-07:00 (Plan) Deciding which key value data to transfer for CIR

Then the merge exits without any errors (or completion message). When I rerun the merge with the same start/end, it then returns without any message, but the log has errors/exception:

2016-07-28T10:23:14.193-07:00 (Cmd) merge --archive i:\dbbackup --repo cir-backup --start 2016-07-28T09_29_11.1285502-07_00 --end 2016-07-28T10_16_18.0604728-07_00
2016-07-28T10:23:14.194-07:00 Merging backups [2016-07-28T09_29_11.1285502-07_00 2016-07-28T09_43_21.9372698-07_00 2016-07-28T10_09_07.2107684-07_00 2016-07-28T10_10_12.4825425-07_00 2016-07-28T10_10_37.7169631-07_00 2016-07-28T10_11_04.7654939-07_00 2016-07-28T10_11_12.5439512-07_00 2016-07-28T10_12_27.7022462-07_00 2016-07-28T10_12_54.5257911-07_00 2016-07-28T10_16_18.0604728-07_00]
2016-07-28T10:23:14.195-07:00 Creating the merge directory for backup cir-backup
2016-07-28T10:23:15.841-07:00 Merging data from 2016-07-28T09_43_21.9372698-07_00
2016-07-28T10:23:15.841-07:00 (Plan) Executing transfer plan
2016-07-28T10:23:15.841-07:00 (Plan) Transfering bucket configuration for CIR
2016-07-28T10:23:15.842-07:00 (Plan) Transfering views definitions for CIR
2016-07-28T10:23:15.843-07:00 (Plan) Transfering GSI index definitions for CIR
2016-07-28T10:23:15.846-07:00 (Plan) Transfering full text index definitions for CIR
2016-07-28T10:23:15.847-07:00 (Plan) Deciding which key value data to transfer for CIR
2016-07-28T10:23:16.078-07:00 (Plan) Transfering new key value data for CIR
Exception 0xc0000005 0x0 0x2 0x7ffcc4e2eea2
PC=0x7ffcc4e2eea2
signal arrived during external code execution

github.com/couchbase/goforestdb._Cfunc_fdb_open(0xc082674ab8, 0x3957c0, 0xc08215b520, 0xc000000000)
??:0 +0x40
github.com/couchbase/goforestdb.Open(0xc0826d0fc0, 0x6e, 0xc082674ab0, 0x0, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/godeps/src/github.com/couchbase/goforestdb/file.go:51 +0x228
github.com/couchbase/backup/storage.(*ForestDBFile).Open(0xc08211e480, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/storage/forestdb.go:104 +0x79
github.com/couchbase/backup/storage.OpenDataBackup(0xc0826d0070, 0x62, 0x900640, 0x4, 0x32, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/storage/databackup.go:147 +0x75e
github.com/couchbase/backup/archive.(*bucketDir).openDataDirectory(0xc082006bc0, 0x3, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/archive/dir_bucket.go:75 +0x57
github.com/couchbase/backup/archive.createArchiveDump(0x3c05f8, 0xc082153ff0, 0xc082006bc0, 0xc0826cc000, 0x400, 0x400, 0xc0826ce000, 0x400, 0x400, 0xc0820ac690, …)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/archive/source.go:246 +0x49
github.com/couchbase/backup/archive.(*archiveSource).Data(0xc0820a1140, 0xc082051e30, 0x3, 0xc082051e34, 0x20, 0xc082003ec0, 0x3c05f8, 0xc082153ff0, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/archive/source.go:199 +0xc3f
github.com/couchbase/backup/plan.(*data).execute(0xc0820a8740, 0x3b4eb8, 0xc0820a1140, 0x3b4f30, 0xc0820a1100, 0xc0820a2b90, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/plan/data.go:62 +0x354
github.com/couchbase/backup/plan.(*serial).execute(0xc082091b00, 0x3b4eb8, 0xc0820a1140, 0x3b4f30, 0xc0820a1100, 0xc0820a2b90, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/plan/serial.go:65 +0x12e
github.com/couchbase/backup/plan.(*parallel).execute(0xc082091ad0, 0x3b4eb8, 0xc0820a1140, 0x3b4f30, 0xc0820a1100, 0xc0820a2b90, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/plan/parallel.go:65 +0x12e
github.com/couchbase/backup/plan.(*serial).execute(0xc082091aa0, 0x3b4eb8, 0xc0820a1140, 0x3b4f30, 0xc0820a1100, 0xc0820a2b90, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/plan/serial.go:65 +0x12e
github.com/couchbase/backup/plan.(*Plan).ExecutePlan(0xc0820a2b90, 0x3b4eb8, 0xc0820a1140, 0x3b4f30, 0xc0820a1100, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/plan/plan.go:98 +0xc8
github.com/couchbase/backup/archive.(*Archive).MergeIncrBackups(0xc082090de0, 0xc082008270, 0xa, 0xc082003f80, 0x21, 0xc082003fb0, 0x21, 0x0, 0x0, 0x0)
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/archive/archive.go:349 +0xa0b
main.parseMergeCommand()
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/cmd/cbbackupmgr/merge.go:58 +0x540
main.main()
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/cmd/cbbackupmgr/main.go:100 +0x50b

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
C:/Users/vagrant/cbdepscache/exploded/amd64/go-1.5.2/go/src/runtime/asm_amd64.s:1721 +0x1

goroutine 5 [syscall]:
os/signal.loop()
C:/Users/vagrant/cbdepscache/exploded/amd64/go-1.5.2/go/src/os/signal/signal_unix.go:22 +0x1f
created by os/signal.init.1
C:/Users/vagrant/cbdepscache/exploded/amd64/go-1.5.2/go/src/os/signal/signal_unix.go:28 +0x3e

goroutine 6 [chan receive]:
main.sigHandler()
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/cmd/cbbackupmgr/main.go:307 +0x178
created by main.parseMergeCommand
C:/Jenkins/workspace/watson-windows/couchbase/goproj/src/github.com/couchbase/backup/cmd/cbbackupmgr/merge.go:56 +0x4ea
rax 0x0
rbx 0xf974a0
rcx 0x643d860
rdi 0x10b
rsi 0x643a990
rbp 0x23e110
rsp 0x23dfe0
r8 0x360
r9 0x6f6a3c0
r10 0x7ffccf240000
r11 0x246
r12 0x7ffcc4e47860
r13 0x0
r14 0x23fd70
r15 0x0
rip 0x7ffcc4e2eea2
rflags 0x10202
cs 0x33
fs 0x53
gs 0x2b

I created my backup with cbbackupmgr backup. I then create a few incremental backup (with no new documents inserted or changes made) and merge them together. Sometimes the merge works. Sometimes the merge fails with the above characteristics. Any idea?


#2

scheok,

First off, when cbbackupmgr doesn’t finish with a success message that almost always means there was a crash. Instead of dumping the stack to stdout we put it in the logs which is why you are finding it there. Since it’s a crash I cannot print anything to stdout. The one thing I can see is that the crash appears to happen on the boundary between cbbackupmgr and the forestdb kvstore. Exception 0xc0000005 0x0 0x2 0x7ffcc4e2eea2 points to an access violation error which is thrown when opening the data files. I haven’t seen this type of error before the way you are. Usually when I see an access violation error it is because the forestdb library isn’t linked properly. That can’t be the case here though since cbbackupmgr does usually work correctly for you. Can you let me know what version of windows you are running on and any other information about your system and setup so I can try to reproduce this? In the meantime I have filed an issue here: MB-20402.


#3

Thanks Mike for the reply.
I also filed a request with Support (http://support.couchbase.com/requests/13630) and have been working with Jeff D. on this. In my test, it seems the problem occurs when I use CBBACKUPMGR COMPACT. If I don’t compact the backup, the problem doesn’t seem to occur.

I’m using Windows 2012 R2. This is a VM. Host hypervisor is VMware ESXi and also Windows 7 running VirtualBox. (I’m able to reproduce this on two different environments very consistently when using COMPACT.)

More information is available in Request #13630 if you have the access.

Thanks for creating the issue and looking into this.


#4

MB-20402 was updated by Support with my comments and testing with regards to COMPACT. I also had uploaded logs in the Support ticket request 13630.