"failed to create persistent volume claim: context deadline exceeded" error on creating Couchbase operator with Persistent storage


#1

I was trying to setup the Couchbase Operator on IBM Cloud Kubernetes, but face an issue while adding Persistent storage to the cluster. After running the cbopctl create command the Couchbase services are created but Couchbase pods are not. The Persistent Volume is in the Pending state and then gets deleted on its own. Here’s the error in the operator logs -

time="2018-11-23T09:54:39Z" level=info msg="deleted pod (cb-example-0000)" cluster-name=cb-example module=cluster
time="2018-11-23T09:54:39Z" level=error msg="Cluster setup failed: fail to create member's pod (cb-example-0000): failed to create persistent volume claim: context deadline exceeded" cluster-name=cb-example module=cluster
time="2018-11-23T09:54:39Z" level=warning msg="Fail to handle event: ignore failed cluster (cb-example). Please delete its CR"

Here is the yaml that I used to create the operator -

apiVersion: couchbase.com/v1
kind: CouchbaseCluster
metadata:
  name: cb-example
  namespace: test-op
spec:
  baseImage: couchbase/server
  version: enterprise-5.5.1
  authSecret: cb-example-auth
  exposeAdminConsole: true
  adminConsoleServices:
    - data
  cluster:
    dataServiceMemoryQuota: 256
    indexServiceMemoryQuota: 256
    searchServiceMemoryQuota: 256
    eventingServiceMemoryQuota: 256
    analyticsServiceMemoryQuota: 1024
    indexStorageSetting: memory_optimized
    autoFailoverTimeout: 120
    autoFailoverMaxCount: 3
    autoFailoverOnDataDiskIssues: true
    autoFailoverOnDataDiskIssuesTimePeriod: 120
    autoFailoverServerGroup: false
  buckets:
    - name: default
      type: couchbase
      memoryQuota: 128
      replicas: 1
      ioPriority: high
      evictionPolicy: fullEviction
      conflictResolution: seqno
      enableFlush: true
      enableIndexReplica: false
  servers:
    - size: 3
      name: all_services
      services:
        - data
        - index
        - query
        - search
        - eventing
        - analytics
      pod:
        volumeMounts:
          default: couchbase
          data:  couchbase
          index: couchbase
  securityContext:
    fsGroup: 1000
  volumeClaimTemplates:
    - metadata:
        name: couchbase
      spec:
        storageClassName: "default"
        resources:
          requests:
            storage: 1Gi

The deployment works fine when Persistent Volumes are not added to the yaml. Tried with couchbase operator 1.0 and 1.1, got the same error.


#2

This is kinda urgent. Any help here would be greatly appreciated.


#3

Hi Jerome.

This is a known issue for clouds/storage providers that have poor performance characteristics. At present in Operator <=1.1.0 we have a timeout set for 5 minutes, which is evidently not long enough for IBM Cloud.

We have a fix planned for Operator 1.2.0 (to be released early 2019) that will allow you to override the default timeout.

To my mind 5 minutes to create a persistent volume is somewhat excessive. I’d be interested to know IBM’s take on why this is taking so long. They may be able to offer some workarounds to improve performance in the short term and allow your deployment.


#4

Thanks for replying.

It’s a lot less than 5 minutes. Here’s the entire log -


time="2018-12-06T19:17:49Z" level=info msg="Janitor process starting" cluster-name=cb-example module=cluster

time="2018-12-06T19:17:49Z" level=info msg="Setting up client for operator communication with the cluster" cluster-name=cb-example module=cluster

time="2018-12-06T19:17:49Z" level=info msg="Cluster does not exist so the operator is attempting to create it" cluster-name=cb-example module=cluster

time="2018-12-06T19:17:49Z" level=info msg="Creating headless service for data nodes" cluster-name=cb-example module=cluster

time="2018-12-06T19:17:49Z" level=info msg="Creating NodePort UI service (cb-example-ui) for data nodes" cluster-name=cb-example module=cluster

time="2018-12-06T19:17:49Z" level=info msg="Creating a pod (cb-example-0000) running Couchbase enterprise-5.5.1" cluster-name=cb-example module=cluster

time="2018-12-06T19:19:49Z" level=info msg="deleted pod (cb-example-0000)" cluster-name=cb-example module=cluster

time="2018-12-06T19:19:49Z" level=error msg="Cluster setup failed: fail to create member's pod (cb-example-0000): failed to create persistent volume claim: context deadline exceeded for pvc-couchbase-cb-example-0000-00-index" cluster-name=cb-example module=cluster

time="2018-12-06T19:19:49Z" level=warning msg="Fail to handle event: ignore failed cluster (cb-example). Please delete its CR"

Looking at the logs it times out after exactly 2 minutes. Is this parameter configurable?


#5

Has this issue been seen before? Should it be raised as an issue in Jira?


#6

@jerome Sorry you are having an issue. Yes, please raise an issue in Jira and we can follow up on the issue