Request context id mismatch between request and response

I am using request context id to follow request / response and also use it to check that the response received is to the request sent in the Java SDK - We added this defensive code as we had some funny issues and just wanted to check we never had a mismatch!

But when using prepared statement, it looks like we get some mismatch - so it is like we receive a response with a modified context id from the query sent - and this is on a cluster of 1 node only!

We do not use prepared statement on a multi nodes cluster as there is known issue in Couchbase 4.5 when using multiple times the same index on multiple nodes… But this does not affect a cluster of 1 node, so that is why we tried to enable prepared statement on 1 node.

So is this a known issue that, maybe, because using prepared statement, the query sent is different from the one provided to SDK (as I guess the SDK replace it with encoded plan) and therefore we end up with a different context id in the response?

Hi Bertrand,
could I ask you to clarify the behaviour re clientContextId and prepared statements?
How exactly is the client context id mismatching? Could you give me specific examples?

I have done a bit of code inspection on the N1QL side and realistically what we take in, we spit out.
Just to be sure I tested it out both when using the prepared REST parameter and using an execute statement.
The only things to be aware are - we truncate lengths to 64 characters and if the context id contains quotes or colons we throw an error, but this applies to whether a statement is prepared or not.

Thx,
Marco

This is what we do in the code

// Builder is not immutable so we need to build a different one per request without reusing
private N1qlParams computeN1qlParams(boolean usePreparedStatement) {
    N1qlParams n1qlParams = N1qlParams.build()
            .consistency(ScanConsistency.REQUEST_PLUS)
            .adhoc(!usePreparedStatement);
    return n1qlParams;
}

private List<N1qlQueryRow> executeQuery(N1qlQuery query) {
    // Generate a client context id for sanity check on response
    String clientContextId = generateN1QLClientContextId();
    query.params().withContextId(clientContextId);

    // Do the query
    N1qlQueryResult queryResult = bucket.query(query);

    // Check response match query
    if (!clientContextId.equals(queryResult.clientContextId())) {
        throw new Exception(
                String.format("Error querying from couchbase with query [%s] with context id [%s]" +
                                " - Response context id [%s] does not match",
                        query.statement().toString(), clientContextId, queryResult.clientContextId()));
    }

    ...
}

So this is very simple, we just check whatever we generated and added to query matched the return one… the usePreparedStatement is never changed at runtime so when we run it is either true or false!

And do no think this is an issue of truncating as the string are really different…

Example:

Error querying from couchbase with query [aQuery] with context id [n1ql-085e46e5-51e8-47ef-879e-436535465cab] - Response context id [n1ql-aa893260-a39f-4238-be12-5e4ad5338878] does not match

Error querying from couchbase with query [anotherQuery] with context id [n1ql-f71e4309-505e-40de-a6eb-fd8c006c8bb7] - Response context id [n1ql-fbd71a5c-44bc-430d-8fbf-51594b4d3cc0] does not match

Thank you. So they are completely different.
Well, I can confirm that from the N1QL side, aside from truncation, it’s a simple in-out process and the prepared cache doesn’t even see the client context id.
I wonder if the SDK guys can shed some light as what happens on their side.

Unfortunately, the docker image that was always failing with this is not existing anymore so will not be able to get information out of it…

But when we created the container, we had issue / error creating some indexes, so we stopped and restarted it - it may be that we were in a very strange place as all the queries were failing with mismatch - it may not be related to prepared statement then!

Was just wanted to check if there was known limitation or not combining prepared statement and request context id

We managed to reproduce the error on 1 node running in a container - this has nothing to do with prepared statement as it is disable for our tests (renamed ticket)…

It looks like a threading issue as we can see the following in our logs
15:38:24 Executing N1QL query with context id n1ql-0433e2ae-21fa-4ca7-a0d1-07cd990d4b20 and statement [aQuery]
15:38:25 Executing N1QL query with context id n1ql-7a4dce93-7be1-499e-a352-c883785f305b and statement [anotherQuery]
e15:38:26 Error querying from datastore with query [aQuery] with context id [n1ql-0433e2ae-21fa-4ca7-a0d1-07cd990d4b20] - Response context id [n1ql-7a4dce93-7be1-499e-a352-c883785f305b] does not match

So we end up getting the response from another request…

Our test exercise our full system a lot and start fine, but at some point, every N1QL query will fail with mismatch and it does not seem to recover

Thanks @lbertrand. Do you have a test case that shows this which you can share? We can investigate further from there. If you have an account, it’d be great to post it to issues.couchbase.com, but we’ll take it in any format.

The issue is that for now, only managed to have this when exercising our full system (going through quite a few layer of services!), so do not have a simple test case that is creating this…

I know this is not great to try to get it resolved, it could be in the way we use the couchbase client, but as we are using the synchronous query method, and not the asynchronous one, I will think the mismatch could happen in the background threads inside the client - could it be that a response is missed somehow and therefore the latest response is associated with the current blocking query?

The responses are pipelined but also handled in order. Behind the synchronous query() method, it’s using the async method. We’ll check with the team to see if there is something amiss here. Any other description of the scenario you can offer would be great. For instance, was there some kind of connection disruption or other error in the log that correlated?

I know you may not have much, but the more about the scenario you have to offer, the more we can try to come up with something to look into this.

Looking at the logs, there is no obvious errors when the mismatch context id start!
I could run the couchbase java client with more logs:

  • which level could be useful for you? We are using SLF4J
  • I guess should be for the logger com.couchbase.client?

For completness of the example code, this one also shows the public entry point

public List<N1qlQueryRow> executeSimpleQuery(String statement) {
    return executeQuery(N1qlQuery.simple(statement, computeN1qlParams(false)));
}

public List<N1qlQueryRow> executeParameterisedQuery(String statement, JsonObject placeholderValues) {
    N1qlQuery query = N1qlQuery.parameterized(statement, placeholderValues, computeN1qlParams(usePreparedStatement));
    return executeQuery(query);
}

// Builder is not immutable so we need to build a different one per request without reusing
private N1qlParams computeN1qlParams(boolean usePreparedStatement) {
    N1qlParams n1qlParams = N1qlParams.build()
            .consistency(ScanConsistency.REQUEST_PLUS)
            .adhoc(!usePreparedStatement);
    return n1qlParams;
}

private List<N1qlQueryRow> executeQuery(N1qlQuery query) {
    // Generate a client context id for sanity check on response
    String clientContextId = uuidService.generateN1QLClientContextId();
    query.params().withContextId(clientContextId);

    LOG.infoFormatted("Executing N1QL query with context id %s and statement %s", new Object[]{clientContextId, query.statement().toString()});

    // Do the query
    N1qlQueryResult queryResult = bucket.query(query);

    // Check response match query
    if (!clientContextId.equals(queryResult.clientContextId())) {
        throw new Exception(
                String.format("Error querying from couchbase with query [%s] with context id [%s]" +
                                " - Response context id [%s] does not match",
                        query.statement().toString(), clientContextId, queryResult.clientContextId()));
    }

    // Check response success
    if (!queryResult.parseSuccess() || !queryResult.finalSuccess()) {
        throw new Exception(
                String.format("Error querying from couchbase with query [%s] - Errors are [%s]",
                        query.statement().toString(), queryResult.errors().toString()));
    }

    return queryResult.allRows();
}

The only thing I can think is finest level. @daschl or @subhashni, what would help us get some info in this case?

Sure, finest/trace level logs should be good. I did try out the code snippet posted here and was not able to reproduce. @lbertrand how long does it take for the errors to happen?

@subhashni Using SLF4j a logger at TRACE level of com.couchbase.client, is this what will interest you?

The code snippet is only the low level query we have, but as I say, we can reproduce it when we run our full system on our local docker environment and we can have it happening quickly - so it may be related to system issue too - which maybe possible to see with the low level logs of couchbse SDK client!

Yes, that should be good enough for now.

@subhashni @ingenthr I tried to gather debug logs for the SDK as well as trace logs but can’t post here. So I opened a support ticket #14493 if you would like to have a look at the logs. Thanks.

For some reason, I can see the exception when in debug level, but in trace level can’t see the exception in the logs but still the test fails.

I updated to Couchbase 4.5.1 on the server side, and also at the same time, upgrade to latest java sdk, 2.3.4 (from 2.3.2) and can’t reproduce the mismatch - no change of code on our side

Is there anything that has been corrected in the new versions that may explain this fact?

Many thanks.

@lbertrand at least not on the client side I think.

There has been some tightening of prepared statements code between 4.5 and 4.5.1 but no change in the way clientContextId gets handled by the n1ql service.