I did some research on the forums and there are plenty of posts about the “Index scan time out” error message, but the suggestions are mostly focused around changes to the index to make it return in less than two mins. We can definitely go that route, but I have some other questions before doing so:
Is there a way to simply increase the two-minute limit if we know we have some large indexes?
We assumed it was “cheaper” for Couchbase to maintain one large index instead of many smaller indexes. It’s definitely “cheaper” from a development perspective in that ideally we don’t have to create new indexes when on-boarding new clients.
We have a covering index that includes some pretty small data fields, but the index is HUGE. Here are some details:
Total documents in bucket: 14M
Total bucket data size: 17GB
Size of index: 9GB (size on disk when viewing the index file)
Num of docs in index: ~5M
We’re only pulling three small properties out, which average 60 characters all together. So the question being, how does 60 character of data across 5M docs get blown up into a 9GB index?
I also posted the following question, which we wonder is contributing to the hitting the two-minute limit.
To answer the question of “why” we have these large indexes, it’s mostly for aggregate queries. Here’s another example of query that hits the two-minute limit, but you would expect to be pretty quick over an indexed 14M doc dataset:
SELECT SUBSTR(meta().id,0,POSITION(meta().id,"-")), COUNT(meta().id) FROM cloud_med GROUP BY SUBSTR(meta().id,0,POSITION(meta().id,"-"))
We format our document keys into collections of sorts like this: type-client_name-unique_id. So the query above is handy to report on what “type” of data is contributing to the total document count. But this query almost always hits the two-minute index scan limit against the primary index.