Deleting ~50mn documents, will reindexing be a problem for Couchbase?

I’m trying to delete roughly 50 million documents using N1QL. I’ll be doing that in multiple queries with batching. I tested with batch size of 10k and it finished in about 5 hours.

My question is if we delete so many documents with such queries in a short duration, how would it affect Couchbase’s re-indexing? Would it make Couchbase slow? If yes, what can I do to avoid that?

Thanks in advance!

PS: To summarize: will firing so many deletes in quick succession lead to any performance issue in Couchbase? Will it get slow to respond to other queries?

Our website uses Couchbase heavily and these deletes will be performed with our site up - so I’m worried if we will see any performance hit in Couchbase.

Hi @Nikhil,

May I know which version of Couchbase Server are you using? Is it Community Edition or Enterprise Edition?

All sorts of operations - create, update and delete of the documents will lead to index updation. But with couchbase, index updation happens asynchronously, meaning - delete operation will return before it is reflected in the index. After you delete 50 million documents, it will take some time for those deletions to get reflected in the index. Having said that, if you are using enterprise edition, those deletes will get reflected in the index fairly fast. This is because enterprise edition uses much faster storage engine, as compared to community edition.

But there can be still some delay in deletions getting reflected in the index. To ensure that the scan doesn’t return stale data, you can use consistent scan.

In any case, it is good to batch the deletions (in smaller sets) so that consistent scans won’t get slow (especially if you are using Community Edition). I suggest you to wait for some time after processing one delete batch, and before triggering next delete batch - so that deletions won’t affect scan requests adversely.

Out of topic but

sounds like some undocumented difference in features https://www.couchbase.com/products/editions or is it mysterious CBO?

@skaryshev,

It is documented under “Performance and Scaling” table in https://www.couchbase.com/products/editions :slight_smile: . The “Memory-Optimised Index” and “Plasma – high-performance storage engine for indexes” is supported only in Enterprise edition.

Having said that, CE also works well with millions of documents being indexed. But EE gives better guarantees when it comes to consistent (sub 10ms) latency and consistent throughput.

Got it. Thank you :wink:

1 Like

Many thanks Amit for quick response!

We are using Enterprise Edition 6.0.2. I understand the delay of re-indexing and I’m fine with that.

My question is (perhaps I should have phrased it better): will firing so many deletes in quick succession lead to any performance issue in Couchbase? Will it get slow to respond to other queries?

Our website uses Couchbase heavily and these deletes will be performed with our site up - so I’m worried if we will see any performance hit in Couchbase.

(Updated the original question as well)

Hi @Nikhil,

The CPU available for indexer process will be shared between (1) the task that indexes incoming deletions (or creations/updations) and (2) the task that serves the scans.

A burst of incoming deletions will surely need some CPU for processing the deletes. But if there is enough CPU available for scans (i.e. without deletes your CPU is not saturated), there will be very less impact on scan latency/throughput.

I suggest that you can try out this experiment in some non-production setup and understand the impact it may have on scans. Also, if possible, perform deletes during off-peak hours.

By the way, what type of index are you using? Memory Optimised, or Standard GSI?

Hello @amit.kulkarni ,

Thanks for your reply.

We have enough CPU capacity. Usually it stands under 40% in Production and our deletion process, when run on a non-prod env, used around 20% CPU. So both together in Prod will still be around 60%.

We are using Memory Optimized GSI.

So what I am understanding from your reply is that, (given enough CPU and MO-GSI) re-indexing will not hamper scan latency, beyond negligible. Is that the right conclusion?

@Nikhil,

Yes. We can conclude like that. Its good to know that the experiment on non-prod environment was successful. Also, having memory optimised index helps at it does not require any disk I/O for the index updation. This eliminates another dimension that can affect performance.

Having said that, I still suggest to be “extra cautious” and delete the documents in batches, so that we won’t have any surprises. :slight_smile:

Yeah! :slight_smile:

Thanks for the advise @amit.kulkarni and for all the info, really appreciate it!

Just in case someone else is also deleting a huge number of documents and wondering about Indexing, fragmentation and CPU :- our deletion process completed successfully on our “Enterprise Edition 6.0.2” Couchbase which has MO-GSI index. Process took under 3 hours to delete 30mn records - we used N1QL to fire delete queries with a batch of 5k.

I am glad to know that the document deletion went well. :slight_smile:

1 Like