X509: certificate signed by unknown authority

Hi Guys, getting this error while provisioning Couchbase on K8s with Couchbase Operator

Error from server (InternalError): error when creating “cma/couchbase/deployment/06-couchbase-cluster.yaml”: Internal error occurred: failed calling webhook “couchbase-operator-admission.couchbase-dev.svc”: Post “https://couchbase-operator-admission.couchbase-dev.svc:443/couchbaseclusters/mutate?timeout=10s”: x509: certificate signed by unknown authority (possibly because of “crypto/rsa: verification error” while trying to verify candidate authority certificate “couchbase-operator-admission CA”)

any idea? what was the issue

apiVersion: v1
kind: Secret
metadata:
  name: cb-auth
  namespace: couchbase-dev
type: Opaque
data:
  username: QWRtaW5pc3RyYXRvcg== # Administrator
  password: cGFzc3dvcmQ=         # password
---
apiVersion: couchbase.com/v2
kind: CouchbaseCluster
metadata:
  name: cb-cluster
  namespace: couchbase-dev
spec:
  image: couchbase/server:6.6.2
  security:
    adminSecret: cb-auth
  buckets:
    managed: false
  xdcr:
    managed: false
  servers:
  - name: data-service-2a
    size: 1
    services:
    - data
    volumeMounts:
      default: pvc-cb-data-2a
      data: pvc-cb-data-2a
    pod:
      spec:
        nodeSelector:
          kubernetes.io/hostname: cpt-tkc01-store-dev-large-nodepool-wdev-kg4df-5d598bbcbc-pwgqk
  - name: data-service-2b
    size: 1
    services:
    - data
    volumeMounts:
      default: pvc-cb-data-2b
      data: pvc-cb-data-2b
    pod:
      spec:
        nodeSelector:
          kubernetes.io/hostname: cpt-tkc01-store-dev-large-nodepool-wdev-kg4df-5d598bbcbc-r9jc7
  - name: index-query-service-2a
    size: 1
    services:
    - index
    - query
    volumeMounts:
      default: pvc-cb-index-query-2a
      index: pvc-cb-index-query-2a
    pod:
      spec:
        nodeSelector:
          kubernetes.io/hostname: cpt-tkc01-store-dev-large-nodepool-wdev-kg4df-5d598bbcbc-pwgqk
  - name: index-query-service-2b
    size: 1
    services:
    - index
    - query
    volumeMounts:
      default: pvc-cb-index-query-2b
      index: pvc-cb-index-query-2b
    pod:
      spec:
        nodeSelector:
          kubernetes.io/hostname: cpt-tkc01-store-dev-large-nodepool-wdev-kg4df-5d598bbcbc-r9jc7
  - name: others
    size: 1
    services:
    - search
    - eventing
    - analytics
    pod:
      spec:
        nodeSelector:
          kubernetes.io/hostname: cpt-tkc01-store-dev-small-nodepool-wdev-bkdpb-d549445cf-4djjc
    volumeMounts:
      default: pvc-cb-others
  volumeClaimTemplates: 
  - metadata:
      name: pvc-cb-data-2a
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: cpt-vshpere-with-tanzu-storage-policy
      resources:
        requests:
          storage: 100Gi
  - metadata:
      name: pvc-cb-data-2b
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: cpt-vshpere-with-tanzu-storage-policy
      resources:
        requests:
          storage: 100Gi
  - metadata:
      name: pvc-cb-index-query-2a
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: cpt-vshpere-with-tanzu-storage-policy
      resources: 
        requests:
          storage: 100Gi
  - metadata:
      name: pvc-cb-index-query-2b
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: cpt-vshpere-with-tanzu-storage-policy
      resources: 
        requests:
          storage: 100Gi
  - metadata:
      name: pvc-cb-others
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: cpt-vsphere-with-tanzu-storage-policy
      resources: 
        requests:
          storage: 100Gi

Nothing to do with your cluster definition, it’s the admission controller (DAC).

What’s wrong is that the DAC is provisioned with one certificate/key and the web hook configurations have a completely different CA certificate installed, thus the certificate won’t validate, and it will not work. As to how you’ve got into this situation, well the TLS is randomly generated (rotated) per run of the tool (no point having predictable keys!) Somehow some parts are installed from one run, and some parts are installed from another, hence the discrepancy.

You have to completely uninstall the DAC with:

cbopcfg delete admission

If this is part of an upgrade, then you have to uninstall with the toolchain you used to install it, not the upgraded version.
Then recreate it with:

cbopcfg create admission

Or whatever the analogous method is for your version of the Operator.

Thank you this work as you mentioned. clean uninstall and reinstall again works!

did not work for me… deleting and re-creating via “cao” still results in “tls: bad certificate”

is there any way I can get out of this without having to redeploy my cluster including restoration from backup?

Are you upgrading to 2.3 by any chance? If so, then we removed the mutatingwebhookconfiguration (which still has an old CA in it). When doing upgrades you need to uninstall with the old tool version (which knows about the old resources), then install with the new one.

Over the past few releases I’ve been sneaking in annotations so we know the version something was installed with, at which point we can infer what needs to be deleted and do an automatic all in one upgrade command like magic. Keep an eye out for that in the next few releases…

1 Like

that was it - THANK YOU!!!