Couchbase openshift operator problem- couchbase cluster will not come up

  1. We recently applied network policies to the our OpenShift project to enable multi-tenancy per OpenShifts documentation Configuring multitenant network policy - Network policy | Networking | OpenShift Container Platform 4.6.
  2. After doing this, when we created a new Couchbase instance with 4 pods, only one pod was getting created.
  3. We opened a ticket with RedHat to diagnose this issue further as we were not seeing any errors.
  4. While working with RedHat, we notice that the Couchbase Operator was installed on the openshift-operator project instead of the inf-auto project where we created the Couchbase cluster instance. I remember selecting inf-auto when I installed it the first time, so this was unexpected.
  5. We removed the operator and re-installed it in the inf-auto project.
  6. When we tried to create a new Couchbase cluster instance, no pods are getting created and we see the following error:

{“level”:“info”,“ts”:1619812059.4318697,“logger”:“cluster”,“msg”:“Cluster does not exist so the operator is attempting to create it”,“cluster”:“a-couchbase-test/cb-example-test4”}

{“level”:“info”,“ts”:1619812059.4931834,“logger”:“cluster”,“msg”:“Creating pod”,“cluster”:“a-couchbase-test/cb-example-test4”,“name”:“cb-example-test4-0000”,“image”:“registry.connect.redhat.com/couchbase/server@sha256:fd6d9c0ef033009e76d60dc36f55ce7f3aaa942a7be9c2b66c335eabc8f5b11e”}

{“level”:“info”,“ts”:1619812059.515399,“logger”:“cluster”,“msg”:“Member creation failed”,“cluster”:“a-couchbase-test/cb-example-test4”,“name”:“cb-example-test4-0000”,“resource”:""}

{“level”:“info”,“ts”:1619812059.5357425,“logger”:“cluster”,“msg”:“Pod deleted”,“cluster”:“a-couchbase-test/cb-example-test4”,“name”:“cb-example-test4-0000”}

{“level”:“info”,“ts”:1619812059.5357752,“logger”:“cluster”,“msg”:“Reconciliation failed”,“cluster”:“a-couchbase-test/cb-example-test4”,“error”:“fail to create member’s pod (cb-example-test4-0000): pods “cb-example-test4-0000” is forbidden: unable to validate against any security context constraint: [provider restricted: .spec.securityContext.fsGroup: Invalid value: int64{1000}: 1000 is not an allowed group]”,“stack”:“github.com/couchbase/couchbase-operator/pkg/util/k8sutil.CreatePod\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/util/k8sutil/k8sutil.go:246\ngithub.com/couchbase/couchbase-operator/pkg/util/k8sutil.CreateCouchbasePod\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/util/k8sutil/pod_util.go:104\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).createPod\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:489\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).createMember\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/reconcile.go:299\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).create\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:289\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcile\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/reconcile.go:117\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:398\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:429\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\t/home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:256\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:88”}

Our openshfit admin installed our operator since we do not have the permission. This is the summary from them, however we do not have a working couchbase cluster. We really do not understand why the network policy change could affect couchbase cluster.

That’s the hint we need… let me explain. In the distant past, you needed to fill in the fsGroup correctly or persistent volumes wouldn’t work. After every user didn’t fill this in, we decided to try do it for you with the dynamic admission controller. On OCP this interrogates the namespace that the cluster lives in and extracts the fsGroup from the annotations, which makes me suspect that the dynamic admission controller isn’t working correctly. You can manually set the fsGroup using these instructions Persistent Volumes | Couchbase Docs

I manually reset the fsGroup, all the pods come up. Thanks for your help.

I sent the following to our openshift admin, questioned the DAC is not there.
Check the Status of the Operator
You can use the following command to check on the status of the deployments:
$ oc get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
couchbase-operator 1/1 1 1 8s
couchbase-operator-admission 1/1 1 1 8s
CONSOLE
The Operator is ready to deploy CouchbaseCluster resources when both the DAC and Operator deployments are fully ready and available

root@usapprshilt100:/Automation/projects/openshift #oc project a-couchbase-test
Now using project “a-couchbase-test” on server “https://api.ivz-ocp-poc.ops.invesco.net:6443”.
root@usapprshilt100:/Automation/projects/openshift #
root@usapprshilt100:/Automation/projects/openshift #oc get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
couchbase-operator 1/1 1 1 3d3h
root@usapprshilt100:/Automation/projects/openshift #n the poc,
I did not see the couchbase DAC running, couchbase-operator-admission is missing

But our admin mentioned if in our dev env, when they installed couchbase for all named space, they did not see DAC running, but everything is working fine. Now they want to install the couchbase operator only for our name space. this is where the problem pods are not coming up. So when the operator is installed for all the name space, you do not need DAC ?

No the DAC needs to always be installed. We recommend it’s run in the default cluster mode, and therefore you only need one installed, in any namespace.

When we install it from GUI interface, according to our openshift admin, after he click install, the DAC is not installed. I could try to install it using the yaml file according to the instruction on the operator documents, however, I think that my permission as the admin of the name space is not good enough to finish the installation, it will still need openshift cluster admin role to install it ?

That’s correct, you need to install the DAC manually, it is not installed alongside the operator from the Openshift UI.