Rebalance failed between two new servers


#1

I am testing a setup of two Couchbase servers (Windows Server 2012 R2, Azure) in a cluster, ver 2.5.1

After the second server was installed with Couchbase, it would join the cluster of the first server no problem. However, on attempting to rebalance the cluster, after some time it would throw

Rebalance failed. See logs for detailed reason. You can try rebalance again.

Looking at the logs the relevant entries appear to be

Rebalance exited with reason {unexpected_exit,
{‘EXIT’,<0.2835.2>,
{badmatch,
[{‘EXIT’,
{{badmatch,{error,closed}},
{gen_server,call,
[<18017.12639.0>,had_backfill,
infinity]}}}]}}}

ns_orchestrator002 ns_1@10.32.0.5 05:14:35 - Wed Oct 15, 2014

<0.2828.2> exited with {unexpected_exit,
{‘EXIT’,<0.2835.2>,
{badmatch,
[{‘EXIT’,
{{badmatch,{error,closed}},
{gen_server,call,
[<18017.12639.0>,had_backfill,infinity]}}}]}}} ns_vbucket_mover000

ns_1@10.32.0.5 05:14:33 - Wed Oct 15, 2014

What does that mean, and how to correct this problem?


#2

After many, many attempts and failures the rebalance finally completed.

According to the explanation

http://blog.couchbase.com/rebalancing-couchbase-part-ii

rebalances will pick off where the last (failed) attempt left off; so it seems a single-shot rebalance was somehow not possible between the two nodes, but incrementally on each attempt, they managed to sync up all the vbuckets in the buckets.

The weird is, these are a brand new setup with empty buckets and no client activity; it still took a mighty long time.


#3

It is interesting that it succeeded at the end.
Curious; Are the machines in the same vnet on azure and are they talking to each other through public IPs or private IPs?
-cihan


#4

Couchbase cluster servers all belong to the same private Vnet subnet and not accessible to the public.


#5

Trying out a brand new cluster setup today, and the old rebalance failure is back again. This time round each attempt typically fails in less than a minute, moving what seems to be a few replica items in each stint. I am curious to know what those hundreds of replica items really are when those are empty buckets, zero items.

This is going to be a long day…


#6

Got around the problem by deleting the couchbase bucket in question; the nodes then didn’t need to rebalance, then adding back the couchbase bucket was easy and swift.


#7

Am super curious just exactly what Active# and Replica# vbucket items are being transferred to and fro when there are no actual data items in the bucket.


#8

I have exactly the same issue. Did you find the reason?

Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.26347.6215>,
{badmatch,
[{'EXIT',
{{badmatch,{error,closed}},
{gen_server,call,
[<19618.28584.1>,had_backfill,
infinity]}}}]}}}

#9

in log file I see the same description of error. How can I determine what going wrong?