Unable to create Couchbase cluster with k8s

Hello

I am trying to setup couchbase cluster in Openshift using k8s based on this blog: http://blog.kubernetes.io/2016/08/create-couchbase-cluster-using-kubernetes.html

The configs are available here.

Based on the instructions, I am unable to attach the second node to the cluster. The server-add command in the configure-node.sh passes “$COUCHBASE_MASTER:8091” as input for “–cluster” arg. On runtime $COUCHBASE_MASTER assigned as “couchbase-master-service”. I get the following error:
Unable to connect to host at http://couchbase-master-service:8091

If I try to manually execute the server-add command in the localhost using cluster’s host IP address as input for “–cluster” arg, it works fine.

Any help is appreciated.

@vigneshb4u These instructions for raw Kubernetes. OpenShift instructions are at https://blog.openshift.com/openshift-ecosystem-couchbase-openshift-nosql-applications/. Have you tried those?

What version of OpenShift are you using?

Yes, I have no issues getting a Couchbase container working in Openshift. Both the Master and Worker are up and running. Only issue is that the worker node is not getting attached to the cluster because the error while executing “server-add” command.

Following are my versions:
openshift v1.3.0
kubernetes v1.3.0+52492b4

@arungupta Okay I see that referencing service name “couchbase-master-service” was possible in raw Kubernetes because the DNS Server add-on. https://kubernetes.io/docs/user-guide/services/#dns The DNS server helps resolving the service name to an IP address. Do we have a similar solution in Openshift?

The Openshift link you shared handles a single couchbase node in Openshift. Are there any instructions on setting couchbase cluster in Openshift?

@vigneshb4u This makes sense. The DNS service resolution should still work. https://github.com/arun-gupta/kubernetes-java-sample/tree/master/maven#using-kubernetes is a simple Spring Boot that accesses Couchbase using Kubernetes service. But I’ve not tried this in OpenShift, will give it a shot and let you know.

@arungupta Update: I figured that the DNS resolution for the service is working fine. But the pod is unable to ping the service IP.

# ping couchbase-master-service.ns1.svc.cluster.local
PING couchbase-master-service.svc.cluster.local (172.30.156.57) 56(84) bytes of data
From 171.71.241.194: icmp_seq=1 Destination Host Unreachable 

But I am not sure if this is related though. Because in the error I received on server-add says “Unable to connect to host at http://couchbase-master-service:8091” In this case, I believe it still didn’t resolve to the IP address.

If you want to try a slightly different approach, I found this setup to work pretty well on Azure ACS. You can skip the sync gateway bits if you don’t need them, but you do need to build and push a docker image.

Minor hijack: @arungupta - still can’t access the web UI on the proxy as per your blog post. Proxied UI tries to access 8091/pools on the non-proxied URL. I left a comment on your Azure blogpost.

@wilsondy Thanks for the suggestion. I read through the code and I see that they’re using ENV variable $COUCHBASE_SERVICE_HOST to refer back to the cluster service. I still don’t have luck in getting the service name CNAME DNS record to work. I’ll try this now and update how it goes.

@wilsondy I tried the setup you referred. I getting an error while excuting couchbase-cli server-add

Error: Failed to add server 172.17.0.5:8091: Prepare join failed. Joining node to itself is not allowed

Is it because the virtual IP for the service is already mapped to the current Pod’s IP. I see the current node’s IP in the service endpoint. I am not sure if this is a race condition.

# oc get endpoints
NAME           ENDPOINTS                         AGE
couchbase      172.17.0.2:8091,172.17.0.5:8091   5m
couchbase-ui   172.17.0.2:8091,172.17.0.5:8091   5m

The ENV variable. COUCHBASE_UI_SERVICE_HOST=172.30.219.143 correctly resolves to the service IP.

# oc describe svc/couchbase-ui
Name:			couchbase-ui
Namespace:		petset
Labels:			app=couchbase-ui
Selector:		app=couchbase
Type:			LoadBalancer
IP:			172.30.185.207
LoadBalancer Ingress:	172.46.177.235
Port:			http-ui	8091/TCP
NodePort:		http-ui	30416/TCP
Endpoints:		172.17.0.2:8091,172.17.0.5:8091
Session Affinity:	ClientIP
No events.

I’m thinking you didn’t use their Docker image? The script checks to see if your the first on the service and if so, doesn’t attempt to join self.

@wilsondy When I use the script in the Docker image, it is unable to connect to the service IP while executing “couchbase-cli server-info”. I realized it is unable to connect only when it is creating the container. Once the script finishes running, and I execute the same “couchbase-cli server-info” command inside the pod itself, it is able to connect to service IP and works fine. I tried troubleshooting using --debug option in the script, following is the output:

INFO: running command: server-info
ERROR: command: server-info: 172.30.143.73:8091, [Errno 101] Network is unreachable

Any idea what could cause issues in network connectivity issues during container creation?

I have not seen this. Sounds like you may need additional delays before carrying out the script while the infrastructure is setup.

I’m now working on a Statefulset based approach and have made a lot of changes to the TopHatch stuff. If I get some time, I will put it up on Github.

The problem with unable to access the Service IP from the pod was very specific Openshift all-in-one setup using Vagrant. When I installed openshift in a VM, the pod was able to access the Service IP, without any issues. I could not troubleshoot the vagrant issue but it is irrelevant to this forum.

Also from the tophatch example, I improvised the readiness probe with more logic as I didn’t want to use temp file (/tmp/joined_cluster) in production. Here is my code, please feel free to provide any feedback.

#!/bin/bash

# Check readiness of couchbase installation by listing buckets in localhost
if timeout 30 couchbase-cli bucket-list --cluster=localhost:8091 --user=$COUCHBASE_FULL_ADMIN_USER --password=$COUCHBASE_FULL_ADMIN_PASSWORD &>/dev/null ; then
  # If the current node lists bucket, local installation is ready.
  :
else
  exit 1
fi

# Check readiness of couchbase node in cluster by listing servers in cluster's service host.
# For the first node in cluster, the service host should not resolve, because the Node IP will 
# be part of service endpoint only after the pod is ready. Using that logic to detect first node.
timeout 30 couchbase-cli server-list --cluster=$COUCHBASE_CLUSTER_SERVICE_HOST:$COUCHBASE_CLUSTER_SERVICE_PORT -u $COUCHBASE_FULL_ADMIN_USER -p $COUCHBASE_FULL_ADMIN_PASSWORD
retval=$?
if [ $retval -eq 0 ]; then

  # If the service host resolves, check if the current node is already added to the cluster.
  THIS_NODE_IP=`hostname -I`
  if timeout 30 couchbase-cli server-list --cluster=$COUCHBASE_CLUSTER_SERVICE_HOST:$COUCHBASE_CLUSTER_SERVICE_PORT -u $COUCHBASE_FULL_ADMIN_USER -p $COUCHBASE_FULL_ADMIN_PASSWORD | grep $THIS_NODE_IP ; then
    # If the current node is part of the cluster, node is ready.
    :
  else
    # The readiness test will not pass until the node is added to the cluster.
    exit 1
  fi
fi

exit 0

Thanks @arungupta and @wilsondy for all the help.