Parallel inserts in couchbase using multithreading in Java


#1

I have a huge dataset of around 40 Million records and I’m trying to insert/set documents in couchbase using multithreading (using Java Thread Pool). I’m not able to get the increased speed. Currently I’m getting around 10K operations/second on couchbase, but I want to increase this write speed to maximum possible. Is there any built-in API in couchbase that supports this, or any method in Java to have multiple connection, and spawn say 100s/1000 threads for insertion? I have a machine with decently good configuration, RAM.


#2

Are you using the 2.x SDK? If that’s the case, the SDK internally uses its own pools and parallelisation mechanisms and that may be why your not seeing performance improvements.

Take a look at the documentation page on bulk operations, you’ll find great examples of how to improve your performance I believe.
If you already did this sort of thing, or if you’re using the 1.4.x line of the SDK, let’s work it out further.

Also quick question, what do you mean by decently good configuration (CPU cores #, RAM, storage type, …)?


#3

Just for a quick update, I’m using 1.4.2 SDK.
Machine configuration:
CPU: Intel® Xeon® Processor E5620 ( 4 cores, 8 threads, 2.4GHz)
Memory: ~ 48GB

I’m having a look at the bulk operations option.


#4

Ah if you’re using the 1.4 SDK then unfortunately the link I provided doesn’t apply (it’s for SDK 2.x). In 1.4 we only had bulk get, no bulk set. For 1.4, bulk get documentation is here, but you may also benefit from switching to async if you’re not already using the async methods, see this section of the documentation.