Cannot add node after Fail Over/Rebalance - Node doesn't support requested services: [<<"kv">>]

40-rc

#1

I ran into a rather odd issue this morning.

We have an internal testing environment for couchbase of 1 cluster containing 3 server nodes. This morning when I checked out the couchbase console, 1 node was stuck on pending and another was down. I checked the two machines in question and they were doing fine. So I checked the logs and all it mentioned was something like: ‘Shutting down bucket “XXXXX” on
’ns_1@xxx.xxx.xxx.xxx’ for server shutdown’. So I failed that node over and immediatly the pending node came back up and running perfectly.

I figured it might have been something like a scheduled task that hogged too much memory and had the ‘down’ node crash or something so I just went ahead and rebalanced the cluster and then tried to re-add that 3rd node but all I’m getting is an:

Attention - Prepare join failed. Could not connect to “xxx.xxx.xxx.xxx” on port 8091. This could be due to an incorrect host/port combination or a firewall in place between the servers.

All firewalls are deactivated on all 3 machines internally to avoid those issues - this has never happened before. I’m unsure if I should try uninstalling couchbase and then reinstalling it instead.

Any help is appreciated.


#2

one thing you can do is try to ping the new name/ip you are adding from the existing nodes and make sure all traffic and name resolution makes it through.
-cihan


#3

There are no issues pinging the ip from one of the other node machines - the whole issue seems only to be with couchbase as I have test websites and a bunch of other stuff, all running fine on the machine :confused:


#4

I’ve noticed the following::

A few weeks back I thought I updated to Couchbase 4.0rc (specifically on the current problematic node) - and through some hocus pocus thought it was all updated.

I’m now suspecting I was wrong, and that when the node failed, it attempted to restart as 4.0rc (and now 4.1) while the rest of the cluster(nodes) are still in 4.0dev.

Would I be correct in assuming this would cause the problem I am having, and would there be a way to update both remaining nodes without losing data? (Worse case scenario I know I can just do a backup, but recopying all the views is a 10 min hassle. Aren’t we all lazy?)

This still however does not explain why I can’t reach the couchbase console from the problematic node itself (it isnt in the cluster at the moment so shouldnt it let me reach the couchbase setup page?)

UPDATE::
So I ended up doing a complete uninstall of couchbase, rebooting, reinstalling, and now it seems to be properly set up. When I try to add the node I get this error:

Node doesn’t support requested services: [<<“kv”>>]. Supported services: [index, kv, n1ql]

Most likely because it has couchbase 4.1 when the other nodes existing already on the cluster are 4.0dev . So this comes back to, how can I update the cluster to 4.1 without losing the data (other than doing a backup)

Fully uninstalling couchbase everywhere and doing a fresh installation + cbrestore had everything work in the end. Not a pleasant solution for sure.