we have a task of retrieving and deleting periodically all documents from a specific bucket.
The bucket has over 100+ million documents and currently the documents are retrieved via Python SDK and execution of the looped N1QL query:
query = cb.n1ql_query(N1QLQuery( 'SELECT meta(alias).id, * FROM bucket_name AS alias LIMIT $batch_size', batch_size=batch_size))
and then there is executed: remove_multi(keys)
where the batch_size is usually 250k.
As the keys are not known in the script it’s not possible to use (faster?) function get_multi(keys).
In our processing this operation is the main bottleneck and it takes surprisingly a lot of time in a comparison to writing the documents into a relational DB (currently writing is 5x times faster).
We have 3 nodes on different machines and the script is executed on another machine (and relational DB is also on another machine).
Is there any better / faster way to retrieve a batch of documents without knowing their key?
Thank you a lot for any ideas or hints.