Q: rebalance error on Operator(k8s)

hi
I use Operator deploy couchbase server 3 nodes. those nodes running normally for 16 days. but today, I can’t sync data from couchbase lite. I log in the web console, rebalance has an error when rebalancing 40%.

Rebalance exited with reason {service_rebalance_failed,cbas, {badmatch, {error, {bad_nodes,cbas,get_agent, [{‘ns_1@ydd-cbs-0002.ydd-cbs.default.svc’, {exit, {{{case_clause, {error, {unknown_error, <<“failed_to_cancel”>>}}}, [{service_agent,cancel_task,2, [{file,“src/service_agent.erl”}, {line,474}]}, {service_agent, ‘-cancel_tasks/2-fun-0-’,2, [{file,“src/service_agent.erl”}, {line,484}]}, {lists,foreach,2, [{file,“lists.erl”},{line,1323}]}, {service_agent,cleanup_service,1, [{file,“src/service_agent.erl”}, {line,502}]}, {service_agent,do_handle_connection,2, [{file,“src/service_agent.erl”}, {line,331}]}, {service_agent,handle_connection,2, [{file,“src/service_agent.erl”}, {line,308}]}, {service_agent,handle_cast,2, [{file,“src/service_agent.erl”}, {line,192}]}, {gen_server,handle_msg,5, [{file,“gen_server.erl”}, {line,604}]}]}, {gen_server,call, [{‘service_agent-cbas’, ‘ns_1@ydd-cbs-0002.ydd-cbs.default.svc’}, get_agent,infinity]}}}}]}}}}

I refer error delete the ydd-cbs-0002 pod, the k8s recreate the pod after 2 mins. I click rebalance couchbase server, the web console UI display error:

Unexpected server error, request logged.

the couchbase server logs below

[ns_server:error,2019-03-09T03:38:10.081Z,ns_1@ydd-cbs-0000.ydd-cbs.default.svc:<0.22129.7>:menelaus_web:loop:143]Server error during processing: [“web request failed”,
{path,“/controller/rebalance”},
{method,‘POST’},
{type,exit},
{what,
{timeout,
{gen_fsm,sync_send_all_state_event,
[{via,leader_registry,ns_orchestrator},
{maybe_start_rebalance,
[‘ns_1@ydd-cbs-0000.ydd-cbs.default.svc’,
‘ns_1@ydd-cbs-0001.ydd-cbs.default.svc’,
‘ns_1@ydd-cbs-0002.ydd-cbs.default.svc’,
‘ns_1@ydd-cbs-0003.ydd-cbs.default.svc’],
,all}]}}},
{trace,
[{gen_fsm,sync_send_all_state_event,2,
[{file,“gen_fsm.erl”},{line,232}]},
{menelaus_web_cluster,do_handle_rebalance,
4,
[{file,“src/menelaus_web_cluster.erl”},
{line,737}]},
{request_throttler,do_request,3,
[{file,“src/request_throttler.erl”},
{line,59}]},
{menelaus_web,loop,2,
[{file,“src/menelaus_web.erl”},
{line,121}]},
{mochiweb_http,headers,5,
[{file,
“/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl”},
{line,94}]},
{proc_lib,init_p_do_apply,3,
[{file,“proc_lib.erl”},{line,239}]}]}]

This problem often occurs.

how can I resolve this issue?

thanks!
angular

Hi @angular,

Thanks for reporting this issue we need some more info to debug.

Thanks

Hi @anil,

Thanks for your reply.

where are running K8S cluster on physical machine or vms or public clouds?

  • public cloud

did you follow the steps here for Deploying Sync Gateway cluster

  • yes, I follow the official step by step setup, and the sg Operating normally.

can you run cbopinfo tool to collect logs and attach them to JIRA issue K8S-906

  • sorry. when occuor error, I try to failover, but not succeed. project in developeing stage, so I delete the couchbase cluster and rebuild.

I am having the same issue. How can it be fixed. This is a brand new couchbase cluster in AKS. No documents, no index nothing. Rebalance failed. stuck at 40%

totally disappointing

Rebalance exited with reason {service_rebalance_failed,cbas,
{badmatch,
{error,
{bad_nodes,cbas,get_agent,
[{‘ns_1@couchbase-cluster-couchbase-cluster-0001.couchbase-cluster-couchbase-cluster.couchbase.svc’,
{exit,
{{{case_clause,
{error,
{unknown_error,
<<“failed_to_cancel”>>}}},
[{service_agent,cancel_task,2,
[{file,“src/service_agent.erl”},
{line,474}]},
{service_agent,
‘-cancel_tasks/2-fun-0-’,2,
[{file,“src/service_agent.erl”},
{line,484}]},
{lists,foreach,2,
[{file,“lists.erl”},{line,1323}]},
{service_agent,cleanup_service,1,
[{file,“src/service_agent.erl”},
{line,502}]},
{service_agent,do_handle_connection,2,
[{file,“src/service_agent.erl”},
{line,331}]},
{service_agent,handle_connection,2,
[{file,“src/service_agent.erl”},
{line,308}]},
{service_agent,handle_cast,2,
[{file,“src/service_agent.erl”},
{line,192}]},
{gen_server,handle_msg,5,
[{file,“gen_server.erl”},
{line,604}]}]},
{gen_server,call,
[{‘service_agent-cbas’,
‘ns_1@couchbase-cluster-couchbase-cluster-0001.couchbase-cluster-couchbase-cluster.couchbase.svc’},
get_agent,infinity]}}}}]}}}}

This is almost certainly a problem with the analytics service and not the Operator/Cloud part. If you can live without it the best option is to just create a cluster without that service enabled. Another option may be to use the latest version of Couchbase server. Analytics GAed properly in 6.0.x, so this is a good start. If none of those work I’d open a support case against the analytics component and see what the relevant team suggests.

HI @simon.murray

thanks for your reply. I’ll update cb server to V6.0.

Best Regrades!

angular