Batch Read Latency

While Couchbase scales well for single key reads, I am observing higher latency when reading batch of 100 keys at a time. Is there a right way to use couchbase-client for batch reads? Currently, the code I have in Scala is:

Observable.from(keys).flatMap(key => {
rx.lang.scala.JavaConversions.toScalaObservable(
cacheStorageBucket.async().get(StringDocument.create(key))).filter(_ != null)
.map(d => key -> d.content().getObj[T])
})
.toList.toBlocking.single.toMap

Couchbase client version :

<dependency>
  <groupId>com.couchbase.client</groupId>
  <artifactId>java-client</artifactId>
  <version>2.1.6</version>
</dependency>

Hi @Aparna_R

(Just a quick note to say that Java SDK 2.x is only about 6 months off EOL now, so I recommend starting to look at migrating to 3.x - or even better, to the Scala SDK we now have.)

That sort of reactive logic is broadly the right way to do batch reads with the Java & Scala SDKs, yes. It might be worth adding some debug to check if it’s actually executing concurrent operations, as a sanity test.

I don’t see anything in the code there limiting to to 100 keys so presumably the keys value itself is limited to 100? In which case the latency of doing the batch will basically equal the latency of the worst-case operation in that batch. If you’re after maximum throughput and the batching isn’t that important to you, it may be better to process all the keys, and let reactive handle things. That should give more reliable and efficient throughput.

I’m a little rusty on SDK2 but in SDK3 (which exposes Project Reactor primitives rather than RxJava - they’re similar) it would look something like:

                    Flux.fromIterable(allKeys)   // Handle everything, all the keys
                            .parallel(100)       // But only have 100 KV operations in flight at a time
                            .runOn(Schedulers.elastic())
                            .concatMap(key -> collection.get(key))
                            .sequential()
                            .collectList()
                            .block();

If possible, it’s even better to not collectList().block(), which in particular will break down with a very large set of keys due to Out-of-Memory errors, and instead to try to extend the reactive streaming deeper into your code. E.g. streaming those results back over HTTP, or streaming them into a database, or whatever you need to do with them.