All nodes remaing in pending state, and all buckets are unhealthy


#1

I’m a new user to this forum so I can’t post images, but I asked the same question on Serverfault where you can see screenshots.

Problem

I have a test setup running Couchbase Server 3.0.3 with one node on a remote server (Windows Server 2012), and one node on my local machine (Windows 8.1). Last time I checked, which was a few days ago, all nodes were up and running with all buckets healthy.

Today I noticed the node on my local machine in the “Pending” state. I first ignored because it is my local machine and figured it was probably something with the firewall or port forwarding not setup correctly.

I created a new bucket and tested it in my application (.net), adding a document to it. This failed with an error “Temporary failure”.

When checking the Couchbase webadmin I noticed the newly created bucket had a state “unhealthy”. After I searched online for a bit and went back to the webadmin suddenly all buckets were “unhealthy” and both nodes were “Pending”.

There is nothing in the log during the time I created the new bucket and when I noticed both nodes were “Pending”.

This is the log of the bucket creation:

Created bucket "taenos" of type: membase
[{num_replicas,1},
{replica_index,false},
{ram_quota,104857600},
{auth_type,sasl},
{autocompaction,false},
{purge_interval,undefined},
{flush_enabled,false},
{num_threads,3},
{eviction_policy,value_only}]

What I tried so far

I tried removing one node but rebalancing doesn’t work because I guess there is no active node left:

Rebalance exited with reason {badmatch,
{error,
{failed_nodes,
['ns_1@couchbase.xxxxxxxxx.com']}}}

I restarted my local machine and the remote server.

I restarted the CouchbaseServer service.

Deleted some other buckets I didn’t use anymore just in case (default
& samples).

I removed the bucket I had created and tried the same steps.

There was a full disk warning so I freed up some disk space.

I’m not sure how to proceed now, so any help is appreciated.


#2

Hi bramdc,
Can you describe the network setup between the tow nodes?


#3

Well, as this was just a test setup I thought I’d add my local machine as a second node, but apparently this was not a good idea. I just found that my local IP has changed, so off course port forwarding is no longer working. After updating the forwarding rules both nodes immediately became available again.

The second node (my local machine) was already in the “pending” state but the first node did stay available up until the point where I created a new bucket.


#4

Bramdc,
Glad to hear that the dynimic IP’s was the reason for the node fails.
Perhaps setting up the cluster using DNS names could help in your setup?


#5

Yes, I actually have it setup using dns names like you propose. The problem was the IP in my local network. I recently re-installed this machine and forgot to give it a fixed IP, after reboot it received a different one from dhcp so the port forwarding rules were forwarding to the wrong internal IP.