Projector question

nagarajan_subbaraj · September 23, 2022, 4:01pm

Hi team,
Need some understanding on the below approach. we had a issue on a spike in projector cpu use due to too many mutation on a specific document with a second or so. it may be due to that or some other scenario but need to see if we can eliminate this.

Let’s say if we remove this document to a separate bucket( no primary or any type of index on this) and we do only KV operation on this document. Will projector still process these mutations and publish DCP streams? or is there any other way to handle this situation.

Thanks.

varun.velamuri · September 23, 2022, 6:28pm

Hi @nagarajan_subbaraj ,

Let’s say if we remove this document to a separate bucket( no primary or any type of index on this) and we do only KV operation on this document. Will projector still process these mutations and publish DCP streams?

If you do not have any indexes created on the new bucket, the projector will not process this document. Projector will only process documents of buckets on which indexes are created.

Need some understanding on the below approach. we had a issue on a spike in projector cpu use due to too many mutation on a specific document with a second or so.

On this, projector is designed as a pipelined parallelism architecture. At a high level, a single document will go through 4 workers. Two workers to route the document from KV to another worker which evaluates the document agains index definitions, extract relevant data i.e. data of interest to the index definition. Once the evaluation is done, another worker takes the responsibility of flushing the data to indexer nodes in the cluster.

Now, considering one specific document for simplicity, it will go through all these 4 workers per indexer node. Let’s say, if there are 5 indexer nodes in the cluster, then there will be 5*4 = 20 workers working on the same document. It was designed like this to make sure that the progress of one indexer node will not impact other indexer nodes in the cluster (E.g., one indexer node can be slow in processing mutations due to heavy DGM but others may not be in the same state). Therefore, document evaluation workers, routing workers, queues are maintained separately per indexer node.

So, it is quite possible that if you have multiple indexer nodes in the cluster, a single document getting updated frequently in a short span of time can cause high CPU. The approach you have suggested sounds good if you are not interested in indexing the data of the document that is frequently getting updated. Instead of moving to a different bucket, you can also consider moving the document to a different collection on the same bucket (Note: Collections feature is only supported from server release 7.0). More information on collections can be found at: Buckets, Scopes and Collections | Couchbase Docs

Thanks,
Varun