Fastest way to bulk load from rest api


#1

Given there’s no way to bulk load from the rest api, I’ve been using a view that indexes every document by its id as a work around and querying it with keys=myObjectIdsArray and include_docs=true. It works but feels slow. To get around a hundred documents takes between 2 and 4 seconds on an iPhone 7. I’ve played around with issuing smaller queries and running them in parallel but haven’t noticed much of a difference. To complicate matters it seems that the android interface has a fairly short length to its max url size and so I’m forced to partition the array and issue multiple smaller queries.

Is there a faster way to bulk load documents by id?


#2

You can use include_docs with an _all_docs query, so you shouldn’t need a special view for that.

20-40ms/doc is really slow! I’m not set up to run stuff on my phone right now, but in a Terminal window on my Mac I can run GET :59840/beer/_all_docs?include_docs=true on a 5000-document database in 0.12 seconds, which is 24µs/doc, i.e. 1000 times as fast. Obviously my MacBook Pro is faster than an iPhone, but at most by a factor of 5 or so.

(It’s also possible your docs are larger than the ones in our canonical beer database, which are a few hundred bytes apiece. It’s also possible a lot of the time is being spent in WebKit parsing the JSON, or maybe in your own code?)


#3

Thanks Jens. Could you try using the keys parameter to select some documents via the REST API? I think that’s the crucial difference. My docs aren’t large, very small in fact.

Also, do you have any advice for doing this on android? It seems that there’s a fairly restrictive limit on the length of the URL. So when I have a few hundred keys I have to issue multiple queries otherwise I’ll exceed the ax url length. In terms of performance is it best to do these sequentially, in parallel, or a mixture of both?

Any idea when _bulk_get be get implemented?

I can’t remember why I wasn’t using the _all_docs endpoint. I thought there was some problem with it?


#4

In addition, I query views to ‘bulk get’ documents by using the include_docs parameter for times when I don’t have the doc ids i need. It seems that no matter how few documents get returned its rare that a query takes less than a second. The database i’m testing with is tiny ~4000 docs (34624367 bytes)


#5

It seems that there’s a fairly restrictive limit on the length of the URL.

You can POST to the view’s URL, with a JSON body like {"keys": ["key1", ...]}. (I haven’t verified this works on Android, but it should, since it’s part of the original CouchDB API.)

Could you try using the keys parameter to select some documents via the REST API?

It would take longer to rig up a test because I’d have to extract the docIDs out of the database as a JSON array, so I may not have time in the near future. It’s a test you can do yourself using LiteServ.

Anyway, given the speed differential, you could always just get all the documents and then pick out the ones you want, and it would still be a lot faster, probably.


#6

I appreciate it would take longer to rig up, could just try it with just a single document id? Selecting every document from the database isn’t an option as it will grow with time and I’d run out of memory. Plus isn’t it a bit crazy that its possibly quicker to select the entire database than select specific documents from it!?

I’ve tried loading the entire database from the _all_docs endpoint. It takes around 1/2 second to load around 3000 documents in the iOS simulator running on a macbook pro. That’s much larger than what you are reporting, but maybe the overhead of the http interface is causing the increase in time.