LegacyDocument content is null

unhuman · March 31, 2016, 7:23pm

We have an existing object in Couchbase. It is JSON. Attempts to retrieve the document as a JsonDocument work fine.

The Couchbase UI shows the document fine. Attempts to retrieve the document as a LegacyDocument returns success, but the content is null.

Attempts to query the document as a StringDocument returns the following error:
com.couchbase.client.java.error.TranscodingException: Flags (0x2000001) indicate non-String document for id XYZ, could not decode.

My understanding of LegacyDocument is that it should always work, yet it doesn’t and I’d certainly expect to be able to read a JSON document as a String.

This problem occurs in Java (v2.2.5). In .NET (v2.2.6), I’m able to read in the document as a String.

EDIT: Debugging the source code and it seems to blow up in the LegacyTranscoder in deserialize() where it logs 5:36:43.717 [cb-computations-4] WARN c.c.c.j.transcoder.LegacyTranscoder - Caught IOException decoding %d bytes of data java.io.StreamCorruptedException: invalid stream header: 7B226576
-There’s a bug in that logging statement, no %d population.

EDIT: I would’ve thought that retrieving the document as BinaryDocument would allow me to just have the data, but nope.

unhuman · March 31, 2016, 10:08pm

I’ve got an admittedly pretty hacky solution:

``package com.cvent.couch.transcode;

import com.couchbase.client.java.transcoder.LegacyTranscoder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.UnsupportedEncodingException;

/**

Created by unhuman on 3/31/16.
*/
public class TolerantLegacyTranscoder extends LegacyTranscoder {

private static final Logger LOG = LoggerFactory.getLogger(TolerantLegacyTranscoder.class);

@Override
protected Object deserialize(byte[] in) {
Object result = super.deserialize(in);
if ((result == null) && (in != null)) {
try {
result = new String(in, “UTF-8”);
} catch (UnsupportedEncodingException uee) {
LOG.error(“This should never occur”, uee);
}
}
return result;
}
}``

Then, I open the bucket like so:
// We override the LegacyDocument processing with our more tolerant implementation List<Transcoder<? extends Document, ?>> transcoderOverrides = new ArrayList<>(); transcoderOverrides.add(new TolerantLegacyTranscoder()); Bucket bucket = cluster.openBucket(bucket, password, transcoderOverrides);

Thanks for Open Source!

ingenthr · April 1, 2016, 5:12am

How was the document originally created? We have the transcoder interface there to handle a situation where perhaps it’s been placed by an older client or a memcached client through moxi.

LegacyDocument is more for reading things from the older 1.4 client or spymemcached, so your solution isn’t necessarily that much of a hack depending on how you got it in there.

It’d be good to know the circumstances that lead to this if there is an issue we need to address.

simonbasle · April 1, 2016, 7:25am

And did you try to retrieve it as a JsonStringDocument or a RawJsonDocument?
First one is for retrieving a JSON content that is a String (in the JSON sense), second one is to get the raw representation of the JSON (whether a json string, boolean, array, dictionary…), as a java String.

unhuman · April 1, 2016, 12:19pm

@simonbasle That particular document, as I mentioned initially, was readable via JsonDocument.

@ingenthr it was probably created by a Node.js client, but I’m not sure which version.

Basically we have some tooling where we try to read and present any document to our users. It seemed like LegacyDocument should handle the “anything” case.

It would be useful to just be able to retrieve any document and allow the application decide how to present it. I think I’m pretty close with my implementation… It seems conceivable to me that there could be a document that’s not retrievable by any provided means.

But, I’m over the hump. Thanks for the feedback.

simonbasle · April 1, 2016, 12:30pm

LegacyDocument was never there to handle the “anything” case. It handles documents produced by the Java 1st gen SDK, that’s it.

Before flag uniformization in the second generation of SDKs (of which Java 2.x is part), each SDK would more or less do its own thing, transcoding wise. You couldn’t share a document between languages, and the effort went into making new generation cross-compatible within the same generation, not backward compatible with all other older SDKs.