How are developers importing millions of documents to Couchbase?


#1

So far, my experience is that cbdocloader only deals with one document at a time, being unable to parse an array of objects (even if the array contains a few, 10, 100 or 1000). This amazes me, because knowing that Couchbase deals with millions of documents with ease, its importing facility (cbdocloader) is sub-par.

My question: how are developers importing many documents to Couchbase? In CouchDB I was able to POST an array of say 1K or 10K at once, multiple times and move entire databases in a few calls. I fail to see how Couchbase sees cbdocloader as a real solution. Littering the file system with millions of documents and the amount of I/O involved in dealing with them doesn’t seem right.

I must be overlooking something important. Please tell me I’m wrong (and how to fix it :wink:


#2

being unable to parse an array of objects (even if the array contains a few, 10, 100 or 1000).

Is the file that you are loading from storing the JSON like A or B?:
A.
[{“data”:1},{“data”:2}]
B.
{“data”:1}
{“data”:2}


#3

Hi Fujio,

I formatted the file with a JSON array, that is, A:

[{"data":1},{"data":2}]

Thanks,

– Tito


#4

The main thing @tito is that cbdocloader was really designed for the sample databases rather than to be a high-performance transfer tool. The tool later introduced for high-performance transfer is cbtransfer.

Just a bit more background: the original cbdocloader was constrained by our need to generate .exe files for the Windows platform and, in that era, generating these from pure Python. Now that we’re using golang extensively in Couchbase, the team is looking to use it as a basis for updates/replacement tools. It should give us much better out-of-the-box performance along with the ability to generate executables for more platforms. There is more detail in MB-17884.


#5

Hi @ingenthr, thanks for the help. I understand now, makes sense. Looking at cbtransfer’s documentation, it looks like it doesn’t support JSON. Are there any plans to enhance cbtransfer to handle JSON batch loading?


#6

This is the best way to use the cbdocloader tool. Here is a bunch of JSON files.

Zip the folder into test.zip

/opt/couchbase/bin/cbdocloader -n (ip_address):8091 -u Username -p password -b bucket test.zip

Now they are in Couchbase Server below.