Batching performance: rxjava async APIs Vs regular multi threading

Lucas_Majerowicz · January 9, 2020, 8:51pm

Hi,

In terms of the performance of bulk operations (fetch, insert, etc…), is there a difference between using the async APIs as described here https://developer.couchbase.com/documentation/server/3.x/developer/java-2.0/documents-bulk.html (flatmap, etc…) and just calling the sync/blocking APIs from a bunch of threads in parallel?

Is the SDK able to efficiently batch requests only when the async APIs are used in conjunction with flatmap? Or can it also batch requests even when they come from different threads and the sync APIs were used?

Thanks,
Lucas

ingenthr · January 11, 2020, 8:11pm

Welcome to the forums!

The SDK itself will be efficient in both cases. More threads tend to mean more context switching, stack space and lock contention. That does make things a little less efficient, but it’s less about the SDK and more about the rest of the application.

That said, it might not be measurable as in most cases, depending on your system, the tall pole in the tent is IO, either network or reading in whatever you’re bulk loading.

The other big advantage to bulk loading with Reactor (in SDK 3.x) or RxJava (in SDK 2.x) APIs is that chaining in error handling is a lot easier than more traditional Future<T> async. Of course, synchronous is even easier to reason about with error handling.

Not really intended as an example, but here is one bit of code that you can probably modify to make a nice, efficient bulk loader from @daschl.