Index building optimization


#1

I have a specific use case where I can create the index prior to insert document that matches the index. I was wonder if there is any means to hint to Couchbase that it may skip the scanning of the bucket on index creation as there will not be any document that matches this index. Consider that the bucket might already have 1 million document or more. Or perhaps couchbase has already done some optimization and if someone can point to me any online document, I am more than happy to read it up myself.


#2

I am guessing this question is related to GSI indexes(and not View indexes). There are already in-built optimizations for the use case you mentioned. The 1 million documents in the bucket will get skipped quickly if none of those qualify to be indexed.


#3

I would think that couchbase will need to at least scan the content before skipping. The biggest concern here is that we are thinking of using couchbase to store historical data, mean the repository will grow in size. I hope to avoid a case where the time taken to create index increases as the repository grow


#4

Yes the content is going to be scanned. But the bulk of the time spent in indexing is for storage operations, which do not happen in this case. You can try to experiment by creating an index on a large dataset of non qualifying documents and see if it is within your performance expectations.


#5

Cool, I shall do that. Thank you