Longest match on key

What would be the best way to implement a ‘longest match’ functionality on keys.
E.g. if a bucket contains documents for keys “1”, “1234” and “123456”, a lookup for key “1234567890” should result in the document stored for key “123456”, but a lookup for key “1234098765” should result in that of key “1234”.

I see two ways of doing this:
a) brute force: try retrieve “1234567890”, “123456789”, etc. until a document is found. The multiple gets can be combined in a single call to lcb_get().
b) use a view to retrieve the doc.ids for keys[“1234567890”, “123456789”, etc], determine the longest present key from the response and retrieve that document via lcb_get().

I have no experience yet with using views in production. How efficient is it, compared to retrieving docs?
Any advice on which method would perform best? Or any alternatives?

Thanks,
Robert

I would personally try the ‘brute force’ method assuming the ratio is 1:3 of found:missing. Performance characteristics may change depending on how many keys are actually found. For example if you’re going to wind up in a needle-in-haystack scenario then it may be worthwhile looking into views, n1ql, etc.

For now however, the brute force method may be better.

Also, consider that the ‘brute force’ method truly divides the request across servers, so that each server only sees n requests. If you will be using n1ql/views, you will be communicating with a single server for the request (though that server itself will change with each request).

Thanks for your input Mark.

The ratio will be a bit worse. Assuming 10 digit numbers and a shortest possible match of 3, the ratio will be 1:7 or 0:7.
So that may tip the scale into the other direction.

Even with 10 digits and minimum 3, it’s only 7 edit: 8 requests (which you could pipeline - issue them all, then look at what you get back after all have responded) and even if you have just 3 server nodes it’s only 2-3 requests per node.

I expect with those numbers the brute force KV approach will be faster than views or N1QL.

One possible exception - if your values are very large - say many KB or even MB - then speculatively requesting all documents might be costly.

For example, if you actually have all intermediate combinations - {123, 1234, …, 1234567890}, and your values are on average 1MB, then if you speculatively issue 8 requests and all 8 are successful, you’ve just received 8MB over the network when you only need 1MB - depending on your network throughput it might be more efficient to probe in order (10 digits, then 9 if that fails, etc) - or a binary search.

Thanks for correcting my counting :wink:

The values are actually rather small: mostly less than 256 bytes, so that is not even an issue.
Having looked a bit closer at the C-API for querying views and processing the results, the brute force approach is a lot simpler and also avoids the need of creating the view all together.
I’ll go ahead and implement it and see how it performs.

Thanks for your input!