Slow write performance using Couchbase Spark Connector 'saveToCouchbase'

We observe very slow write performance using the Couchbase Spark Connector.

The Connector 2.2.0 currently using async bucket and inserts documents one-by-one. We use the Connector with Spark Streaming, where 1000-5000 documents supposed to be inserted per second. Documents go through expensive models before insertion, but nevertheless, the writing dominates the run time of the whole micro-batch.

We have very few indexes on documents, practically one index on the document type (cardinality of 2).

Are there any tips to improve? Maybe rewrite the insertion to bulk operations?

Cheers,
Zoltán

Update:

Now having UPSERTs, which is better, but still not acceptable.
What are the suggested number-of-executors, executor-cores or number of writer partitions based on machine resource dimensions? I see that there is only one CouchbaseConnection per executor. Is that right?
CPU and cluster IO not fully utilized.

Thanks for tips,
Cheers,
Zoltán