I have a bucket with billions of data, and for our case, it is somewhat impossible to add an extra index in the bucket. the pattern of all keys are like bellow
and the only thing I need is to select m_categoryId_* from the bucket.
as I saw in Couchbase (v5) web console dashboard, it can found by bellow url very fast.
please give me the best advice to access these data in my java code.
CREATE PRIMARY INDEX ON default;
SELECT META().id FROM default WHERE META().id LIKE "m\\_categoryId\\_%" LIMIT 1000;
$startid = "m_categoryId_"
SELECT META().id FROM default WHERE META().id >= $startid LIMIT 1000;
change $startid to last value and repeat the query.
id portion of your
m_categoryId_id key is not constrained in any way, you’ll have to go with the N1QL approach. But if the
id were a sequence number or some other value known at lookup-time you could use the Java SDK to grab them by ID.
thank you @vsr1
as you can see in below image, Couchbase found the results very fast from about a billion of data, without any index.
creating index needs huge resources and as I test before, I don’t want to effort that.
I just wondering how Couchbase found these results.
thank you @jkurtz for your good point.
id not generated by me, it is incremental (starts by 1) but it is somewhat sparse on deleted items (maybe the gap is huge ). with some consideration and store maximum of each categoryId, your solution is very nice and fast.
Great! If you have to repeatedly fetch these documents, you could keep track of the gaps you encounter in another document and use that information to avoid unnecessary lookups.