We are building a long running ETL process which works roughly like this:
CB Version: 5.0 Beta2
T1. Insert / Update 300k documents (e.g. 100k updates, 200k inserts) using the asynch bucket api.
T2. Do some other data crunching
T3. Do a N1QL query which needs to return documents from T1 (Read your own writes)
As described in https://blog.couchbase.com/high-performance-consistency/ at_plus consistency allows to pass the Documents or mutations_sequence_ids of the documents in T1 to the Query in T3.
But the example is mainly talking about a use case where you are dealing with very few documents like updating a single user.
What would be an approach in our case where we are dealing with huge mass-import scenarios?
We need to guarantee that query in T3 sees all documents of T1. At the moment we do REQUEST_PLUS, but we are worried that this could lead to slow downs, where AT_PLUS could be more efficient in theory. We have lots of other processes writing and updating to, so the indexer could be quite busy permanently.
- Should we really pass 300k document Ids using N1qlParams.consistentWith(…) ?
- Or should we try a compromise approach to just store the 1st and 300kth docId?
- Or is REQUEST_PLUS our best option here?
Any advice would be appreciated.