JsonStringDocument stored as binary - wrongly?

asarkar · April 14, 2016, 1:11am

Create one document using Java API:
JsonStringDocument.create(key, objectWriter.writeValueAsString(destinationToken));

Upsert:
Observable.from(list).flatMap(aclBucket::upsert).last().timeout(3, SECONDS).toBlocking().single();

where list is a list of JsonStringDocument created as above.

Query:
curl -v -u "xxx:yyy" http://localhost:8093/query/service -d 'statement=SELECT * FROMmy-bucketUSE KEYS ["key"]'

"results": [
        {
            "key: "\u003cbinary (197 b)\u003e"
        }
    ]

Why’s the document returned as binary in spite of being stored as JsonStringDocument? Shouldn’t I get the JSON string back? The same thing happens from admin console. When I click on the document, it gives an error “Warning: Editing of binary document is not allowed”.

simonbasle · April 14, 2016, 7:49am

Did you want to store a single “string” (as in a JSON litteral), or a JSON object represented as a raw String in Java?

The JsonStringDocument is for the first case, while the RawJsonDocument is for the second case.
So maybe switching to using RawJsonDocument will work better for you?

~~That said, if the JsonStringDocument is encoded with a binary flag that is indeed not normal.~~
Just checked the source code and the flags are correct (couldn’t reproduce storing a JsonStringDocument and seeing a warning when editing through webconsole).

asarkar · April 14, 2016, 4:22pm

Hi Simon,
As you can see in my post, the query returns a binary document instead of plain text. Did you run the a similar query and got back plain text instead? What version of the server and Java client did you test with? I’m using Java client 2.2.6 and enterprise 4.1.0.

One more thing - is there any difference between JsonStringDocument and RawJsonDocument, in terms of storage size or retrieval?

simonbasle · April 14, 2016, 5:13pm

Yes I was able to reproduce the case, and using RawJsonDocument is the good way to go. The query service and webconsole editor assume that if a doc is flagged as JSON, then its content is a JSON object (so, starting with {). Otherwise it won’t allow to edit it.

The JsonStringDocument, as I said, is for string literals. The difference is that when encoding, it will actually put quotes around the java String you passed in. If the ObjectMapper gives me:

{"major":1,"minor":2,"patch":3}

Here is what is persisted (notice the additional quotes around):

"{"major":1,"minor":2,"patch":3}"

This could be considered a separate bug (as you can see, no internal escaping of quotes is performed), but once again this Document class is the wrong tool for the job in your case.

Here is what N1QL tells me when I use both methods and select both the content (SELECT *) and the metadata (id, flags, etc…):

"results": [
        {
            "$1": {
                "cas": 108277412986880,
                "flags": 33554432,
                "id": "rawJson",
                "type": "json"
            },
            "default": {
                "major": 1,
                "minor": 2,
                "patch": 3
            }
        },
        {
            "$1": {
                "cas": 108277403484160,
                "flags": 33554432,
                "id": "jsonString",
                "type": "base64"
            },
            "default": "\u003cbinary (33 b)\u003e"
        }
    ]

Notice the flags are the same in both cases but N1QL interprets the value from JsonStringDocument as base64 .

asarkar · April 14, 2016, 6:44pm

Asking my question about storage and retrieval again, if the JsonStringDocument is base64 encoded, isn’t it more compact than a RawJsonDocument, thus making storage and retrieval faster?

simonbasle · April 14, 2016, 7:19pm

It’s not really stored as base64, at least I don’t think so. It’s just wrongly detected as such because the JsonStringDocument expects correctly escaped json text litteral as input, but doesn’t check nor enforce this… Since what you effectively passed in was NOT such an escaped litteral, it ended up confusing the query engine.

JsonStringDocument to store something like “Hello” or “hello " world”" but what you stored is more like “{“hello” : “world”}” (notice no escaped inside quotes)

ingenthr · April 14, 2016, 9:37pm

It’s correct that the documents are not stored as base64. In the map-reduce views and in N1QL, they’re converted to base64. When using the k-v APIs and on disk, it’s just an array of bytes.