Java sdk client cas help


#1

Hi,

couchbase : 4.1.0.5005
java sdk client : 2.4.2

I have multiple hosts trying to execute cron. Only one should win and continue execution.
To make this possible, I have put one doc in couchbase. Multiple hosts trying to get the doc and modify the doc and replace it with cas. if it successed start cron execution, if cas miss exception then exit(i.e. some other host won and running cron).

following is my code:

try {
            JsonDocument cronExecJD = couchManager.getBucket().get(CRON_EXEC_KEY);
            CronDoc cronDoc = mapper.readValue(cronExecJD.content().toString(), CronDoc.class);
            cron_cas = cronExecJD.cas();
            if (!cronDoc.isCronRunning()) {
                cronDoc.setCronRunning(true);
                JsonObject cronExecJO = JsonObject.fromJson(mapper.writeValueAsString(cronDoc));
                couchManager.getBucket().replace(JsonDocument.create(CRON_EXEC_KEY, cronExecJO, cron_cas));
                cronExecJD = couchManager.getBucket().get(CRON_EXEC_KEY);
                cron_cas = cronExecJD.cas();
                log.info(LogConstants.SINGLE_EXEC_PASSED);
                return true; // success
            } else {
                log.info(LogConstants.SINGLE_EXEC_FAILED);
                log.info(LogConstants.CRON_ALREADY_RUNNING);
                return false; // failed
            }
        } catch (CASMismatchException e) {
            log.info(LogConstants.CAS_MISMATCH_ERROR);
            log.info(LogConstants.CRON_ALREADY_RUNNING, e);
            return false; // failed
        } catch (CouchbaseException e) {
            log.error(LogConstants.DATABASE_ERROR, e);
            return false; // failed
        } catch (IOException e) {
            log.error(LogConstants.INTERNAL_ERROR, e);
            return false;  // failed
        }

I am following this doc

scan consistency is default.

In this case multiple hosts are able to execute.

Is there any mistake here?

Thanks in advance,
Nihar Rathod


#2

blogpost is really helpful. big thanks to @don

I will rewrite it with getAndLock.

Thanks
Nihar Rathod


#3

Hi,

So this is my understanding please correct me if I am wrong.

  • CAS based concurrent mutation is optimistic approach to resolve concurrent update contention, but if correct cas value passed by both the updater at the same time, it will allowed to mutate. But it is very rare and corner case.

  • getAndLock based concurrent mutation is pessimistic approach, where exactly one will get the lock and will be allowed to update the doc in given time window. after the time window passed, lock will be released by server and anyone can get/update the document.

In my case, exactly one should win, so I should use getAndLock.

Thanks
Nihar Rathod


#4

With CAS mutation, only one would win. There should be no possibility that two mutations with the same CAS would apply, as the CAS is changed by the first mutation.


#5

@ingenthr thanks.

Can you please check the code snippet above?
In my case, multiple updater are able to update the doc with cas.


#6

@nihar.rathod concurrent/racing access on the same document with different CAS is certainly not possible.

Are you sure they are not calling one after another and how do you actually assert they are racing each other?

Btw this cas issue aside, you can use the RawJsonDocument to store and load the raw json string directly, no need to go through fromsjon or toString() and save some allocations.


#7

@daschl thanks for response.

One Question:
Doc in couchbase:
{
flag: false
}

If two independent processes do following at the same time:

  1. doc = bucket.get(key)
  2. long cas = doc.cas()
  3. doc.flag=true
  4. bucket.replace(doc,cas)

Both are trying to set flag value to true.
Two process running on their own server.

As per your above comment. only one should be able to successed and second one will have cas miss match.


#8

Please help me here. In above mentioned case what can be expected behaviour?


#9

From my quick reading you’re correct, one should succeed and the other should return a CAS mismatch.

Do you see something that deviates from this? If so, maybe you can share the program in a github gist or so and we can spot the problem. CAS is a heavily used feature-- I’ve been using it in the underlying memcached for over a decade-- so I’m pretty sure it’s solid but open to the idea there could be something not yet found!


#10

@ingenthr

Thanks for help.
Yes, Its pretty solid. It was my mistake.


#11

For future reference (and others reading it) can you maybe share what the mistake was and how did you fix it?


#12

I have put a sleep before cas based check, I thought that might be the reason for this issue.
I removed the sleep, so now more number of client tries the modify the doc.

Actually I again hit this without sleep.

I will put code snippet and infra details in next comment.


#13

What I want to achieve?

  • I want to run a cron job daily once at 9:45 AM.
  • My App has spring based scheduler, which wake ups everyday 9:45AM.
  • I have deployed my app on 4 servers. (Couchbase Java client :2.4.2 )
  • I have 3 couchbase node cluster (version : 4.1.0.5005)
  • Important part - The job should be run by only one server. So, all will wake up at the same time but only one should run the job, others should silently exists.

Following is my doc:

{
  "type": "cron_doc",
  "cronRunning": false
}

Following is my cas based check method.

public boolean cronExecCheck() {
        try {
            JsonDocument cronExecJD = couchManager.getBucket().get(CRON_EXEC_KEY);
            CronDoc cronDoc = mapper.readValue(cronExecJD.content().toString(), CronDoc.class);
            long cron_cas = cronExecJD.cas();
            if (!cronDoc.isCronRunning()) {
                cronDoc.setCronRunning(true);
                JsonObject cronExecJO = JsonObject.fromJson(mapper.writeValueAsString(cronDoc));
                couchManager.getBucket().replace(JsonDocument.create(CRON_EXEC_KEY, cronExecJO, cron_cas));
                cronExecJD = couchManager.getBucket().get(CRON_EXEC_KEY);
                log.info("Updated with cas");
                return true;
            } else {
                return false;
            }
        } catch (CASMismatchException e) {
            log.info("Exception", e);
            return false;
        } catch (CouchbaseException e) {
            log.error("Exception", e);
            return false;
        } catch (IOException e) {
            log.error("Exception", e);
            return false;
        }
    }
  • If cronExecCheck returns true then I run the job
  • If cronExecCheck returns false, I consider other server has started the cron job. and I silently exits.

Check has cas based update on doc, which modifies flag(cronRunning) to true.
After job finishes, I set the flag again to false.

In my case, Two servers returns cronExecCheck as true and run the job.
on both the server I am getting log “Updated with cas”.

scan consistency is default.


#14

Please let me know if any detail is needed.


#15

that is weird - would it be possible for you to reproduce this in a standalone script that we can also try to run? Something might be amiss here, because many customers rely on CAS and we have tests in place that make sure this functionality works.


#16

sure, I will try this and share script if I am able to reproduce.
We have hit this once.

Thanks