Couchbase web console error & cpu load high when rebalance & rebalance failed

sfxu · June 17, 2019, 10:12am

We have a Couchbase cluster of 5 servers, all with CE version 4.1.0, and after one server failover,

we get no info after logging in the rest 4 servers’ web console.It looks like this.

image.png2854×782 389 KB

some url like http://10.49.56.202:8091/pools/default?uuid=5434c08006aca68113f479e53418008c&waitChange=3000&_=1560762055766 returned status code of 500,
and such info [“Unexpected server error, request logged.”]
So we add several servers into the cluster, the new added servers’ web console works.

3.And we rebalance the cluster, removing the old 4 servers. Things happened, the CPU LOAD of the old 4 servers became too high, accessing about 40 to 50, and the old 4 servers became pending, and the rebalance failed.
4.The logs says

[user:info,2019-06-17T16:55:47.164+08:00,ns_1@10.49.56.202:<0.20265.1924>:ns_orchestrator:handle_info:493]Rebalance exited with reason {timeout,
                              {gen_server,call,
                               [ns_config,
                                {update_with_changes,
                                 #Fun<ns_config.6.55748145>}]}}

and

{none,<<"Rebalance stopped by janitor.">>}]},

I wonder who is the guy janitor? Why did he stop our rebalance?
5.When the CPU LOAD raise, I found there are many beam.smp with the status of top Dsl. Does the dead beam.smp raise the cpu load?

Any help would be appreciated.