Getting keys from a View is O(n) and not O(1)


#1

Hi,

I have a view that returns only suitable keys. As I understand it is good practice to first fetch the keys and then fetch all the docs for those keys. I’m using:

function (doc, meta) {
if (doc.sType == “MyDocType”)
emit(meta.id, null);
}

Basically what I need returned is a list of keys but it appears that this query takes time proportional to the number of keys.

Is there a one for the view to return one long response ?
Perhaps using a reduce ?

Thanks.


#2

Hi @itay,

technically, processing a view needs to traverse the b-tree, so if I’m not mistaken it should be O(log(n)). If you want O(1), you should look at preprocessing your data so you can fetch it through a key/value “get” command. This will be much quicker, but of course requires more logic to implement.

Note that of course your query is proportional to the number of keys, because you are emitting one item into the index, for every document that matches.

If you need to fetch a list of keys very quickly, I recommend you caching them in a separate document and access it through key/value.


#3

Hi @daschl,

Obviously O(log(n)) but still not O(1) as I want the view only to create offline a list of keys.

Will a reduce function like this help ?

function(keys, values, rereduce) {
  return values.join(",");
} 

Today is all keys in bucket but tomorrow I would like, for example, to get all keys, in one get, for e.g., all beer keys from a brewery. I mean that the view should prepare values for each key where the key is the brewery and the value is a csv of all the beer keys from that brewery.
This will enable me to get all the keys in a quick get and batch get the beers’ documents in a second step


#4

I think even if you apply a reduce function, the code will traverse the B-Tree, but I could be wrong.

@vmx or @ingenthr might be able to provide more insight here. Do you really need O(1) or O(quick enough) for your use case?


#5

The reduce should result with one key, so only one get AFAIK
It doesn’t make sense to wait 1000 times more than just a simple get if the information is already indexed.


#6

Reduces are stored within the B-Tree. In case you have e.g. a simple count and query it without any parameters, it needs to access the root node only (basically a O(log n) collapsed to a O(1).

In case you define a start_key/end_key the B-Tree would be traversed. You probably won’t need to access as many nodes as without a reduce. Though it’s still O(log n).

Besides all that, I wouldn’t really bother and just use it the most convenient way and optimize later.


#7

Let me rephrase.
I want to use views for lookups (like in the beer example above). The view works offline preparing the relevant doc ids and when need arise, I call the view with a specific key and the view should return the pre-calculated IDs.
The return duration should be independent of the database size as it was already calculated.
Thus, reading the view should be as fast as accessing a single key and not a function of the return dataset size