Rebalancing/Add bucket is not working with current Firewall settings on Ubuntu

Hi,

I’m on Ubuntu 14.04, Couchbase Server Community 4.0.

One server acts as reverse proxy (Nginx) to route incoming traffic to 3 Couchbase servers which are in 1 Couchbase cluster. I followed this guide and additionally opened the ports 4984 and 4985 for all 4 servers. Including the servers themselves as a last resort. The ufw status is the same in all 3 Couch servers.

ufw status
Status: active

To                         Action      From
--                         ------      ----
22                         ALLOW       Anywhere
8091                       ALLOW       Anywhere
8092                       ALLOW       Anywhere
8093                       ALLOW       Anywhere
9100:9105/tcp              ALLOW       Anywhere
9998                       ALLOW       Anywhere
9999                       ALLOW       Anywhere
11207                      ALLOW       Anywhere
18091                      ALLOW       Anywhere
18092                      ALLOW       Anywhere
4369                       ALLOW       Anywhere
21100:21299/tcp            ALLOW       Anywhere
4984                       ALLOW       Reverse Proxy IP
4985                       ALLOW       Reverse Proxy IP
4984                       ALLOW       IP of this server (Couch Server 1)
4985                       ALLOW       IP of this server (Couch Server 1)
4984                       ALLOW       IP of Couch Server 2
4985                       ALLOW       IP of Couch Server 2
4984                       ALLOW       IP of Couch Server 3
4985                       ALLOW       IP of Couch Server 3
22 (v6)                    ALLOW       Anywhere (v6)
8091 (v6)                  ALLOW       Anywhere (v6)
8092 (v6)                  ALLOW       Anywhere (v6)
8093 (v6)                  ALLOW       Anywhere (v6)
9100:9105/tcp (v6)         ALLOW       Anywhere (v6)
9998 (v6)                  ALLOW       Anywhere (v6)
9999 (v6)                  ALLOW       Anywhere (v6)
11207 (v6)                 ALLOW       Anywhere (v6)
18091 (v6)                 ALLOW       Anywhere (v6)
18092 (v6)                 ALLOW       Anywhere (v6)
4369 (v6)                  ALLOW       Anywhere (v6)
21100:21299/tcp (v6)       ALLOW       Anywhere (v6)

Rebalancing, or adding a new bucket will not work. The result is that it takes too long and the operation will be stopped. It works after disabling ufw and redoing the operation. Am I missing other ports?

Here is the Log output when adding a new server and then rebalancing:

Log 1:

Bucket "bucket_name" loaded on node 'ns_1@Couchbase-Server-3-IP(this server was added to cluster)' in 0 seconds.    ns_memcached000    ns_1@Couchbase-Server-3-IP(this server was added to cluster)    11:03:41 - Thu Feb 25, 2016

Log 2:

Bucket "bucket_name" rebalance does not seem to be swap rebalance    ns_vbucket_mover000    ns_1@Couchbase-Server-1-IP(Master)    11:03:41 - Thu Feb 25, 2016 

Log 3:

<0.22538.2> exited with {unexpected_exit,
{'EXIT',<0.22545.2>,
{bulk_set_vbucket_state_failed,
[{'ns_1@Couchbase-Server-3-IP(this server was added to cluster)',
{'EXIT',
{{{{case_clause,
{error,
{{{badmatch,
{error,
{{badmatch,{error,etimedout}},
[{mc_replication,connect,1,
[{file,"src/mc_replication.erl"},
{line,30}]},
{mc_replication,connect,1,
[{file,"src/mc_replication.erl"},
{line,49}]},
{dcp_proxy,connect,4,
[{file,"src/dcp_proxy.erl"},
{line,174}]},
{dcp_proxy,maybe_connect,1,
[{file,"src/dcp_proxy.erl"},
{line,161}]},
{dcp_producer_conn,init,2,
[{file,"src/dcp_producer_conn.erl"},
{line,30}]},
{dcp_proxy,init,1,
[{file,"src/dcp_proxy.erl"},
{line,46}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},
{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},
{line,239}]}]}}},
[{dcp_replicator,init,1,
[{file,"src/dcp_replicator.erl"},
{line,49}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]},
{child,undefined,'ns_1@Couchbase-Server-1-IP(Master)',
{dcp_replicator,start_link,
['ns_1@Couchbase-Server-1-IP(Master)',
"bucket_name"]},
temporary,60000,worker,
[dcp_replicator]}}}},
[{dcp_sup,start_replicator,2,
[{file,"src/dcp_sup.erl"},{line,53}]},
{dcp_sup,
'-manage_replicators/2-lc$^2/1-2-',2,
[{file,"src/dcp_sup.erl"},{line,69}]},
{dcp_replication_manager,handle_call,3,
[{file,"src/dcp_replication_manager.erl"},
{line,87}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,585}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]},
{gen_server,call,
['dcp_replication_manager-bucket_name',
{get_replicator_pid,1023},
infinity]}},
{gen_server,call,
[{'janitor_agent-bucket_name',
'ns_1@Couchbase-Server-3-IP(this server was added to cluster)'},
{if_rebalance,<0.22504.2>,
{update_vbucket_state,1022,replica,passive,
'ns_1@Couchbase-Server-2-IP(Already in cluster)'}},
infinity]}}}}]}}}    ns_vbucket_mover000    ns_1@Couchbase-Server-1-IP(Master)    11:05:49 - Thu Feb 25,     2016

Log 4:

Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.22545.2>,
{bulk_set_vbucket_state_failed,
[{'ns_1@Couchbase-Server-3-IP(this server was added to cluster)',
{'EXIT',
{{{{case_clause,
{error,
{{{badmatch,
{error,
{{badmatch,{error,etimedout}},
[{mc_replication,connect,1,
[{file,
"src/mc_replication.erl"},
{line,30}]},
{mc_replication,connect,1,
[{file,
"src/mc_replication.erl"},
{line,49}]},
{dcp_proxy,connect,4,
[{file,"src/dcp_proxy.erl"},
{line,174}]},
{dcp_proxy,maybe_connect,1,
[{file,"src/dcp_proxy.erl"},
{line,161}]},
{dcp_producer_conn,init,2,
[{file,
"src/dcp_producer_conn.erl"},
{line,30}]},
{dcp_proxy,init,1,
[{file,"src/dcp_proxy.erl"},
{line,46}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},
{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},
{line,239}]}]}}},
[{dcp_replicator,init,1,
[{file,"src/dcp_replicator.erl"},
{line,49}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},
{line,304}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},
{line,239}]}]},
{child,undefined,
'ns_1@Couchbase-Server-1-IP(Master)',
{dcp_replicator,start_link,
['ns_1@Couchbase-Server-1-IP(Master)',
"bucket_name"]},
temporary,60000,worker,
[dcp_replicator]}}}},
[{dcp_sup,start_replicator,2,
[{file,"src/dcp_sup.erl"},{line,53}]},
{dcp_sup,
'-manage_replicators/2-lc$^2/1-2-',2,
[{file,"src/dcp_sup.erl"},{line,69}]},
{dcp_replication_manager,handle_call,
3,
[{file,
"src/dcp_replication_manager.erl"},
{line,87}]},
{gen_server,handle_msg,5,
[{file,"gen_server.erl"},{line,585}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,239}]}]},
{gen_server,call,
['dcp_replication_manager-bucket_name',
{get_replicator_pid,1023},
infinity]}},
{gen_server,call,
[{'janitor_agent-bucket_name',
'ns_1@Couchbase-Server-3-IP(this server was added to cluster)'},
{if_rebalance,<0.22504.2>,
{update_vbucket_state,1022,replica,
passive,'ns_1@Couchbase-Server-2-IP(Already in cluster)'}},
infinity]}}}}]}}}
    ns_orchestrator002    ns_1@Couchbase-Server-1-IP(Master)    11:05:49 - Thu Feb 25, 2016 

Regards, Ben

Hello,

I’m in production now. 2 of the 3 couchbase servers were at 100% CPU usage for days. It looked like they were in a dead lock. I restarted one after another, restarted SG and the result was that the reverse proxy server was not able to talk to the couchbase cluster anymore. After I turned off ufw on the three couchbase servers it worked again.

Is there any reliable documentation of which ports need to be opened so that I do not have to worry about this anymore? If there is no support, I’m happy to get a system administrator to care about the issue. Please advise.

Thanks, Ben

MORE INFO ON COUCHBASE PORTS HERE: http://developer.couchbase.com/documentation/server/4.1/install/install-ports.html