Select by modified time field in a large bucket is very slow

Hi,
I have an indexed field lastTimeSeenInbound which is a milliseconds field. The field is indexed -

CREATE INDEX `index_last_timeseeninbound` ON `mybucket`(`lastTimeSeenInbound`)

The bucket contains 60 million docs, query time like this takes minutes:

SELECT count(*) FROM `mybucket`
WHERE 1628236287000 >= lastTimeSeenInbound

why its so slow? what can we do to fix it?

You are looking lastTimeSeenInbound <= “2021-08-06T00:51:27-07:00”
Depends how many it need to scan the index and compute.

If you use EE it can be use index aggregation and make faster Index Grouping and Aggregation Overview - The Couchbase Blog

Thanks! correct, we would like to get the count of all prev half~ year ago.
We are using community edition 6.5 - i assume its scanning around 70% of the 60 millions docs.
I thought there is any way to improve it since its indexed and its a pretty easy count agg. Is there any way to improve such index operation to make it faster?

I would like to count how much im going to delete - how do you suggest to delete such huge amount of documents operation?

Say you want delete the documents < 1628236287000

CREATE INDEX ix1 ON mybucket(lastTimeSeenInbound) WHERE lastTimeSeenInbound < 1628236287000;

Check UI will see index count or use REST API see index stats.

Now You want delete.
Write small SDK program and run the following query
As results stream asynchronously set expiration of the document to 10s and Let the document expiry delete the documents.

SELECT RAW META(t).id 
FROM mybucket AS t 
WHERE  lastTimeSeenInbound < 1628236287000;
1 Like

Thanks, Im not sure I fully understood the expiry, since we are using community edition there is no way to set TTL on documents isnt it?
How selecting using the sdk sets expiry for docs? Can you refer me to some doc?

You can , try explore touch operation in SDKs.

If you want use N1QL

DELETE FROM mybucket AS t 
WHERE  lastTimeSeenInbound  BETWEEN  1528236287000 AND 1628236287000;
  • If need give small ranges (if need use different range in separate request in parallel)
  • If required give LIMIT and repeat until mutation count is 0
  • try request_plus let indexer catch up

NOTE: mutations takes time and might be slow.

Example:

                 Divide 10 different ranges  (depends on cluster resources)
                 Make 10 DELETE statements different ranges with LIMIT 50000
                 Run all queries in parallel
                 wait couple of min, let indexer catch up (or use request_plus)
                 loop again until each query returns 0 mutation count or all desired documents deleted.
                 You can ignore timeout if one raised.
1 Like

Appreciate your fast and elaborated response