Iterating over all the keys in a view using multiple threads


#1

I would like to consume a view in a way that I can split up the work across different threads. My idea is to get the # of IDs in the view, split it up to N and give each chunk to a thread.

Is there a better approach to consume a view from multiple threads in a thread safe way.

I could not find the documentation that describes how can I count the number of keys in a view with the API (Java). Since the skip option is very slow for billion keys the only option is to use start-key and end-key for the multi-threaded pagination. I am wondering how could I get the Nth element of a view to use it for start-key and end-key.


#2

Hi @szopsz,

I guess there are multiple questions in your posting here.

First, you can use the _count reduce function to get a number of the keys in the index based on your criteria. The Java SDK has the query API where it lets you enable or disable the reduce function on demand. Depending on which SDK you are using, thats a little different (2.0 vs 1.4).

In the 1.4 SDK, there is the paginator you can use, which does automatic pagination and for non-reduced views uses the startkey (+ startkeyDocID) approach to reduce overhead, if you have a reduced view of some sort this approach doesn’t work. I think it will in your case though.

You could then get the page for each paginator call and move the computation of the results to a thread pool. The other way would be that you do a _count reduce call first and then split it up. But the problem there is as you said you don’t know the IDs at this point and also keep in mind that data might change during the calls.

If you tell us what you want to achieve maybe we find a solution?