Issue with file permissions setting up Couchbase Backup using Operator

I am trying to setup Couchbase cluster backup using Operator on Google Cloud.

It seems like jobs created by the operator has issues with file permissions:

Found 2 pods, using pod/ds-couchbase-backup-full-27681645-shmjf
Traceback (most recent call last):
  File "/usr/local/bin/backup.py", line 1213, in <module>
    Backup(context).run()
  File "/usr/local/bin/backup.py", line 378, in run
    self._setup_logging()
  File "/usr/local/bin/backup.py", line 1123, in _setup_logging
    os.makedirs(self.context.log_path, exist_ok=True)
  File "/usr/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/data/scriptlogs'

I found in the documentation that it may be needed to set security context CouchbaseCluster Resource | Couchbase Docs.

But for some reason the image operator use for backup (couchbase/operator-backup:1.3.0) has different id for Couchbase user than the one used to run cluster nodes (couchbase/server:7.1.1) - 8453 instead of 1000.

Can you suggest how to fix this issue?

It seems that the issue with disk permissions was solved in image couchbase/operator-backup:1.3.1.

But there is another one - backup pods have TLS secret mounted, but no CA certificate (which is according to this tutorial is in another k8s secret Configure TLS | Couchbase Docs) and therefore cbbackupmgr returns this error:

2022-08-19T10:21:39.425+00:00 (Cmd) Error backing up cluster: open /var/run/secrets/couchbase.com/tls-mount/ca.crt: no such file or directory

Any ideas how to solve this?

UPD: newer image wasn’t the solution. While looking for backup logs and creating job manually as described in Configure Automated Backup and Restore | Couchbase Docs - I managed to run backup by adding security context as:

...
      securityContext:
        fsGroup: 8453

After this the backup job itself can also run successfully even so it doesn’t have security context configured.

1 Like

In case anyone has the same issue - manually adding ca.crt to the couchbase-server-tls secret helped to solve this problem.

1 Like

Thanks to your investigation and sharing your knowledge.
I’m running into the same problem.
Operator-backup 1.3.1 has couchbase:x:8453:8453::/home/couchbase:/bin/false
While in 1.1.0 was x:1000:1000

Server in 7.1.3 has couchbase:x:1000:1000::/home/couchbase:/bin/sh

How did you change the securityContext group at cluster level?
For me adding it makes no difference adding inside the spec, and the server pord returns me this message
“groups: cannot find name for group ID 8453”

Can you clarify a bit, thankyou

Update
I managed to run the backup with the runAsUser: 0
/data is owned by root on the docker image but can’t find scriptlogs file
Update2
I changed the folders ownership to couchbase on /data/backups and /data/scriptlogs and everything started working. No changes in kubernetes definition. It’s all about linux permision on the folders not properly set by the new script backup.py from version >= 1.2.0

I’m digging into the backup.py scripts from 1.1.0 and 1.3.1 extracted from the docker images.
Cause I’m a new user I can’t upload any file.

I find out that version 1.1.0 script has mk_dir definition for /data/scriptlogs /data/backups in case of they don’t exists while version 1.3.1 has not this case controlled so this is the reason I believe it fails.

Extract found script 1.1.0 in /opt/couchbase/bin/

It has the mounts on data on both version of course lines 41-44

MOUNT_LOCATION = os.path.join("/data")

BACKUPS_LOCATION = os.path.join(MOUNT_LOCATION, "backups")

LOGS_LOCATION = os.path.join(MOUNT_LOCATION, "scriptlogs")

STAGING_LOCATION = os.path.join(MOUNT_LOCATION, "staging")

And the making dir def lines 305-308

def create_local_archive(context):
    """
    Creates a local archive if required, and initializes a repository
    if one is required.
    """

    if context.args.mode == MODE_RESTORE:
        return

    if context.args.s3_bucket:
        if context.args.config:
            config_repo(context)
        return

    archive_created = False
    if not os.access(BACKUPS_LOCATION, os.F_OK):
        archive_created = mk_dir(BACKUPS_LOCATION)

    # remove any lock leftover by a dangling cbbackupmgr process
    logging.debug("Removing stale lock file")
    if os.path.exists(os.path.join(BACKUPS_LOCATION, "lock.lk")):
        os.remove(os.path.join(BACKUPS_LOCATION, "lock.lk"))

    # if archive directory was created e.g. an incremental was scheduled
    # first, or we're forcing a new one, configure it.
    if archive_created or context.args.config:
        logging.info("Performing config as backup archive was just created")
        config_repo(context)

Extract found script 1.3.1 in /usr/local/bin/ lines 41-44

MOUNT_LOCATION = os.path.join("/data")

BACKUPS_LOCATION = os.path.join(MOUNT_LOCATION, "backups")

LOGS_LOCATION = os.path.join(MOUNT_LOCATION, "scriptlogs")

STAGING_LOCATION = os.path.join(MOUNT_LOCATION, "staging")

And the initialization of Backups and logs location lines 112-125

    def __init__(self, **kwargs):
        """
        Initialialize defaults that cannot go wrong.
        Don't put any calls in here, they cannot be mocked during initialization.
        """
        self.log_path = LOGS_LOCATION
        if 'log_path' in kwargs:
            self.log_path = kwargs['log_path']

        self.archive = BACKUPS_LOCATION
        if 'archive' in kwargs:
            self.archive = kwargs['archive']

        self.timestamp = datetime.now()