Is it possible to select data by key pattern without adding extra index?

zarrin · November 8, 2017, 2:10am

hello guys
I have a bucket with billions of data, and for our case, it is somewhat impossible to add an extra index in the bucket. the pattern of all keys are like bellow

m_categoryId_id

and the only thing I need is to select m_categoryId_* from the bucket.
as I saw in Couchbase (v5) web console dashboard, it can found by bellow url very fast.

http://myDomain:8091/pools/default/buckets/MyBucket/docs?startkey="m_categoryId"&skip=0&include_docs=true&limit=11

please give me the best advice to access these data in my java code.
thank you.

vsr1 · November 8, 2017, 4:00am

CREATE PRIMARY INDEX ON default;
SELECT META().id FROM default WHERE META().id LIKE "m\\_categoryId\\_%" LIMIT 1000;

OR

$startid = "m_categoryId_"
SELECT META().id FROM default WHERE META().id >= $startid LIMIT 1000;
change $startid to last value and repeat the query.

jkurtz · November 8, 2017, 5:21pm

If the id portion of your m_categoryId_id key is not constrained in any way, you’ll have to go with the N1QL approach. But if the id were a sequence number or some other value known at lookup-time you could use the Java SDK to grab them by ID.

zarrin · November 9, 2017, 10:14am

thank you @vsr1
as you can see in below image, Couchbase found the results very fast from about a billion of data, without any index.

.

creating index needs huge resources and as I test before, I don’t want to effort that.
I just wondering how Couchbase found these results.

zarrin · November 9, 2017, 10:37am

thank you @jkurtz for your good point.

the id not generated by me, it is incremental (starts by 1) but it is somewhat sparse on deleted items (maybe the gap is huge ). with some consideration and store maximum of each categoryId, your solution is very nice and fast.
thanks

jkurtz · November 9, 2017, 2:20pm

Great! If you have to repeatedly fetch these documents, you could keep track of the gaps you encounter in another document and use that information to avoid unnecessary lookups.