Hi,
I’m trying to do a swap rebalance to upgrade server in the cluster. I’m running 2.2.0 with default configuration
I’m just running on 3 m1.medium instances in aws and would like to upgrade these, however when I try to do the rebalance it runs for a little while and then I get error messages like
“Rebalance failed. See logs for detailed reason. You can try rebalance again.”
In the log I find messages like
Rebalance exited with reason {badmatch,
[{<0.27282.134>,
{{badmatch,{error,nxdomain}},
[{ns_replicas_builder_utils,
kill_a_bunch_of_tap_names,3},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}}]}
ns_orchestrator002 ns_1@production.couchbase.node.4 13:55:12 - Thu Nov 14, 2013
<0.27255.134> exited with {badmatch,
[{<0.27282.134>,
{{badmatch,{error,nxdomain}},
[{ns_replicas_builder_utils,
kill_a_bunch_of_tap_names,3},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]}}]}
or
Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.28389.133>,
{badmatch,
[{'EXIT',
{shutdown,
{gen_server,call,
[<18927.20805.0>,had_backfill,30000]}}},
{'EXIT',
{{badmatch,{error,nxdomain}},
{gen_server,call,
[<12941.16232.16>,had_backfill,
30000]}}}]}}}
ns_orchestrator002 ns_1@production.couchbase.node.4 13:50:17 - Thu Nov 14, 2013
<0.28352.133> exited with {unexpected_exit,
{'EXIT',<0.28389.133>,
{badmatch,
[{'EXIT',
{shutdown,
{gen_server,call,
[<18927.20805.0>,had_backfill,30000]}}},
{'EXIT',
{{badmatch,{error,nxdomain}},
{gen_server,call,
[<12941.16232.16>,had_backfill,30000]}}}]}}}
or
Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.24050.133>,
{badmatch,
[{'EXIT',
{noproc,
{gen_server,call,
[<18927.20351.0>,had_backfill,
30000]}}}]}}}
ns_orchestrator002 ns_1@production.couchbase.node.4 13:49:43 - Thu Nov 14, 2013
<0.24040.133> exited with {unexpected_exit,
{'EXIT',<0.24050.133>,
{badmatch,
[{'EXIT',
{noproc,
{gen_server,call,
[<18927.20351.0>,had_backfill,30000]}}}]}}}
or
Rebalance exited with reason {{{badmatch,[{<18927.7378.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,
1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.8013.132>,get_replicators,infinity]}}
ns_orchestrator002 ns_1@production.couchbase.node.4 13:39:45 - Thu Nov 14, 2013
<0.7999.132> exited with {{{badmatch,[{<18927.7378.0>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits,1},
{misc,try_with_maybe_ignorant_after,2},
{gen_server,terminate,6},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.8013.132>,get_replicators,
infinity]}}
The rebalance are done while under load and instances is pretty close to 100% .
I can get through with the entire the rebalance if I just starts a new rebalance each time the previous one fails.
Are the failures caused by high load? Could they be minimized if I the clients would push less pressure on the cluster. Are there other things you could do to try minimizing the failures
Best Regards
Niels