Data agnostic copy? Possible in Java?


#1

Trying to copy data from one cluster to another, but I don’t necessarily know the content type of the data I’m copying (JSON, Binary, etc). So, similar to what XDCR does, I suppose. Also, during this time, I’d like to modify (preserve) the TTL of the data being written.

Seems like all the supporting classes in Java require knowledge of the content.

Is there a way to do this?

Thanks - Howie


#2

Hi Howie,

It might be possible to achieve this using a custom document type.

When opening a bucket you can pass a list of custom document transcoders. You could define your own AgonsticDocument and AgnosticTranscoder classes using BinaryDocument and BinaryTranscoder as models. AgnosticDocument could have a flags field to remember the document type.

Two caveats with this approach:

  • Versions 2.4.4 through 2.5.0 of the Java client have a bug that prevents using custom transcoders. This bug will be fixed in the 2.5.1 release.

  • Document TTL (expiration) is not visible to transcoders. You’d need to get the expiration time some other way, from either a view or a N1QL query (expiration is available as metadata in N1QL starting with Couchbase Server 5.0).


#3

Excellent! Thanks! This should work for what we need to do!


#4

This was easy as pie. @david.nault Do you know when 2.5.1 is expected to drop? Easier to support latest than to fork / workaround with what we have. The release notes don’t talk about custom transcoders being a known issue.

Thanks again!


#5

That’s great to hear!

You are correct, previous release notes don’t mention the issue; it was discovered only recently. A regression test is in the works.

I can’t make any promises, but the current plan is to release 2.5.1 tomorrow.


#6

Version 2.5.1 of the Java SDK has been released :tada:


#7

Hey @david.nault ,

I’ve noticed a few issues that might affect using an agnostic document type:

  1. Looks like I’m always getting flags = 0 as part of a response for a document created by the console… I’d figure that should have some json-indicating flags, right?

  2. Also, with this AgnosticDocument, what is the correct way to override the transcoder?

    public AgnosticDocument newDocument(String id, int expiry, ByteBuf content, long cas)
    public AgnosticDocument newDocument(String id, int expiry, ByteBuf content, long cas,
    MutationToken mutationToken);

Those methods don’t take a flags parameter, so it looks like we’d lose the document type being tracked. I’ve noticed this gets called by Upsert. I don’t think this is critically important in my case (I do not consume the response from an Upsert), but seems like there’s something amiss with that creation.


#8
  1. Which console are we talking about – the web UI, the cbc binary, the couchbase-cli tool, or something else I haven’t learned about yet?

  2. You are correct; I don’t see a good way to preserve the flags along that code path. Some plumbing changes would definitely be required. Let me know if this becomes important for your use case and we can definitely explore further.


#9
  1. The Web UI. It’s probably okay, but something to be aware of. Since the web ui requires JSON, I’d expect there to be some flags.
  2. Yeah; just mentioning it. Again, we don’t consume that response, but I was originally planning on not implementing those methods (throwing an exception), but that fails. So, I just sort of track that in my entity and will throw an exception if consumed (again, no plans to consume).

#10

Sounds like a bug if the Web UI isn’t setting the JSON flags properly… thanks for raising the issue! Works for me with Couchbase Server 5.0 beta. What version are you using?


#11

We’re on version 4.5.1 and it’s definitely doing it. Create a default document, load it up and see flags = 0


#12

Well, this was educational for me. Prior to Couchbase Server 5.0, the Web UI acts like a “legacy client” and does not set document flags.

As you may have already discovered, the modern Couchbase client libraries interoperate with legacy clients by treating flags=0 as a wildcard that can be read as any document type.

This is indeed, as you say, just something to be aware of.