Couchbase failed to backup and write commit


#1

Hi there, I’m currently using couchbase 3.0.1.

I got 3 buckets in the server, with 1 node only.

Just now I faced a problem, the server kept on having “Write Commit Failure” alert.
I tried to cbbackup the data out, 1 of the bucket hangs at “100.0% (58461/estimated 58475 msgs)”, while the 2 others hang at very beginning (even before folder is created).

Here are some logs I found:
Mon Jun 8 22:22:09.972486 HKT 3: (iptb) Warning: couchstore_save_local_document failed error=error reading file [errno = 0: ‘Success’]
Mon Jun 8 22:22:09.972702 HKT 3: (iptb) Warning: failed to save local doc, name=/vol1/couchbase_data/iptb/491.couch.491
Mon Jun 8 22:22:09.972912 HKT 3: (iptb) Warning: failed to set new state, active, for vbucket 491
Mon Jun 8 22:22:09.972994 HKT 3: (iptb) VBucket snapshot task failed!!! Rescheduling
Mon Jun 8 22:22:09.973943 HKT 3: (iptb) Warning: couchstore_save_local_document failed error=error reading file [errno = 0: ‘Success’]
Mon Jun 8 22:22:09.974074 HKT 3: (iptb) Warning: failed to save local doc, name=/vol1/couchbase_data/iptb/311.couch.311
Mon Jun 8 22:22:09.974167 HKT 3: (iptb) Warning: failed to set new state, active, for vbucket 311
Mon Jun 8 22:22:09.974238 HKT 3: (iptb) VBucket snapshot task failed!!! Rescheduling

Now my service is not able to start (as there will be write commit failure, leading to loss of data).
Please tell me how can I recover my service!!!


#2

I know this is a basic question, but how much free disk space do you have?

Also, what operating system are you running?


#3

original environment:
CentOS 6.4
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_seed8gram2core-lv_root
6.6G 5.1G 1.3G 81% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 485M 32M 428M 7% /boot
/dev/sdb1 20G 9.8G 9.0G 53% /vol1

then I copied the data to these environment:
CentOS 6.4
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root 14G 2.9G 11G 22% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 485M 32M 428M 7% /boot
/dev/sdb1 50G 897M 46G 2% /vol1

CentOS 6.5
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_tw-lv_root 50G 12G 35G 26% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 485M 32M 428M 7% /boot
/dev/mapper/vg_tw-lv_home 172G 8.9G 155G 6% /home

Couchbase data in /vol1/
All gave the same result

Thanks


#5

Also, when I try to list the Documents of the bucket that hangs at 100%, it shows:
“Error: internal (memcached_error)”

While found in the log:
[ns_server:info,2015-06-09T11:04:59.429,babysitter_of_ns_1@127.0.0.1:<0.78.0>:ns_port_server:log:169]memcached<0.78.0>: Tue Jun 9 11:04:59.228044 HKT 3: (iptb) couchstore_all_docs failed for database file of vbucket = 311 rev = 146, errCode = 4294967294
memcached<0.78.0>: Tue Jun 9 11:04:59.230594 HKT 3: (iptb) couchstore_all_docs failed for database file of vbucket = 311 rev = 146, errCode = 4294967294
memcached<0.78.0>: Tue Jun 9 11:04:59.247649 HKT 3: (iptb) couchstore_all_docs failed for database file of vbucket = 491 rev = 273, errCode = 4294967294
memcached<0.78.0>: Tue Jun 9 11:04:59.251020 HKT 3: (iptb) couchstore_all_docs failed for database file of vbucket = 491 rev = 273, errCode = 4294967294


#6

no one’s gonna check this case?


#7

When you says

How did you achieve that? I’m wondering if you copied/rsynced, restored from backup, xdcr or else.
The issue could be file permissions, corrupted data files, hardware failure, etc…
Is this a single node or a cluster? What the health state of cb? Are some of your processes being killed by the OS for lack or ram?
Please provide more details :smile:


#8

To get your data out of your original node (write commit failure) did you try cbbackup couchfiles-store://[path to data dir] instead of http://[cluster name]:8091 ?


#9

I just copied the data and configuration files. Files permission was set same as a brand new installed couchbase server
The CB status was healthy (“running” from service couchbase-server status)
Dun see any killed service from the kernel log, no lack of RAM shown.

I’ve listed all the keys with views and inserted them to a brand new CB server, new server running fine, old server’s problem is still a mystery. At least I wanna know how to prevent this from happening again…


#10

also failed with couchfiles-store://[path to data dir]