Delete bulk documents using Java SDK

What is the best way to delete documents as high as 5 million using Java SDK ? Is simply running a N1QL to delete all 5 million records in one go a good idea ? How can batching be achieved using Java sdk ? Also need to determine the doc ids which got successfully deleted and the ones which didnt.

Couchbase server version : 4.1

Hi @manusinha2000,

If you delete that many documents at once using N1QL you may encounter query timeouts. You also won’t get information about what was deleted or what could not be deleted.

I recommend using the asynchronous bucket API to delete by document key (AsyncBucket.remove(String id)). Create an RxJava Observable that emits the document keys you want to delete and use flatMap to emit the Observables that will delete each document.

This approach will allow you to

  • manage the number of requests in flight,
  • selectively retry failures and
  • track your progress.

To control the number of requests in flight you’ll need to use the backpressure API in RxJava. One way is described here: Writing Resilient Reactive Applications (scroll to “Bulk Pattern, BackpressureException and Reactive Pull Backpressure”).

You can also limit in-flight delete requests by limiting the number of subscribers to the flatMap operator performing the deletions. Choose an Observable.flatMap() method that takes a maxConcurrent parameter. See this one.

You should use Java SDK version 2.2 or higher, based on the Couchbase Version/SDK Version Matrix for Java.

Jeff

Thanks so much Jeff.