How to get multiple buckets in one request


#1

Hi, I’m looking at ways of optimizing my requests. I would like to get multiple documents in one go. For example, instead of doing:
bucket1.get(“a”);
bucket1.get(“b”);
bucket1.get(“c”);

I would like to do (pseudo code):
buckets.get(“a”, “b”, “c”);

Is that possible? I read the documentation, but I could only find examples of getting many documents from the same bucket, not getting one (unique) document from three separate buckets.


#2

ok so, to get documents from different buckets, you have to open a connection to each of said buckets:

Cluster cluster; //you should already have that part... remember to reuse Cluster and Bucket instances
Bucket bucket1 = cluster.openBucket("bucket1", "pass1");
Bucket bucket2 = cluster.openBucket("bucket2", "pass2");
Bucket bucket3 = cluster.openBucket("bucket3", "pass3");

once you have these references, you can get your documents:

JsonDocument docFromB1 = bucket1.get("a");
JsonDocument docFromB2 = bucket2.get("b");
JsonDocument docFromB3 = bucket3.get("c");

Problem is, as you may have guessed, that this is done serially, in a blocking fashion.
If you don’t wan’t to have individual docs in a semantically correct variable, but rather would be happy with a list of the docs in any order, I can offer the following async optimization:

AsyncCluster asyncCluster = CouchbaseAsyncCluster.create(env, listOfIP);

//prepare to open each bucket then get it's document, asynchronously
Observable<JsonDocument> doc1 = asyncCluster.openBucket("bucket1", "pass1").get("a");
Observable<JsonDocument> doc2 = asyncCluster.openBucket("bucket2", "pass2").get("b");
Observable<JsonDocument> doc3 = asyncCluster.openBucket("bucket3", "pass3").get("c");

//trigger the actual connection by merging the 3 individual bucket streams...
List<JsonDocument> allDocs = Observable.merge(doc1, doc2, doc3)
    //...then collecting each doc in a common List...
    .toList()
    // optionnally set a timeout for the whole operation by chaining in ".timeout(duration, timeUnit)"
    //...then block, rendez-vous waiting for all of this to finish (it can execute in parallel)
    .toBlocking().single();

The optimization here is that each individual bucket stream can be executed in parallel, so it’ll open connections and retrieve documents in parallel, then aggregate all documents in a list that you wait for at the end.

The catch is that, for instance, bucket2 may be somehow quicker to respond and will serve doc2 first, so the documents in the list arrive in the order [doc2, doc1, doc3] in this case.


#3

Cool, thanks! I’m very familiar with Rx et.c, and I’m fine with replies (JsonDocuments) coming back in any order, but to narrow it down even further (with the async-ness), would it work just as well to do:

List<JsonDocument> allDocs = Observable
    .merge(bucket1.async().get("pass1"), bucket1.async().get("pass2"), bucket1.async().get("pass3")) 
    .toList()
    .toBlocking()
    .single();

or something? What I’m getting at is parallelizing the get() calls without using AsyncCluster.openBucket(). I’m thinking the code sample above should be equivalent, but let me know if it isn’t.

Cheers, Henrik


#4

If you want to only parallelize the gets, yes that would work (except since you use several bucket, you’d have to use different bucket references - here you used bucket1 each time).

Note that “a”, “b” and “c” were the keys in my example, whereas “pass1”, “pass2” and “pass3” are the passwords to connect to each bucket (bucket1, bucket2, bucket3 - use the overload without a password if your buckets are not password protected).

Note: Usually when you want to do bulk loading, but inside a single bucket, you’d follow this pattern instead:

List<JsonDocument> foundDocs = Observable
    .just("key1", "key2", "key3", "key4", "key5")
    .flatMap(new Func1<String, Observable<JsonDocument>>() {
        @Override
        public Observable<JsonDocument> call(String id) {
            return bucket.async().get(id);
        }
    })
    .toList()
    .toBlocking()
    .single();

(as seen in the docs on bulk operations)


#5

Great, thanks for the quick reply. Yes, I would use different buckets, that was a typo (copy/paste error). Of course I would use bucket1, bucket2, and so on.

I will def try it out. The flatMap is also very handy, when retrieving many docs from the same bucket. I do that in other cases, but here I want to use different buckets, and the merge operator will allow me to do that (with the .async() get calls).

Thanks again.