We are facing a challenge regarding how to handle huge amount of data on Couchbase.
Basically we are going to register statistics everyday in a specific bucket, which are going to be 30 millions of documents per month. Over these stats we are writing several views with stale OK strategy in order to not stress out Couchbase engines but these views are going to be to summarize, count and group data in a very specific way.
On the other hand we are going to need specific SQL like searches on these documents. What i mean by specific SQL like searches is to be able to search for a couple of documents on this millions or billions of documents based on some criteria for example date range, type, name, etc.
We have been exploring some approaches like Apache Spark to connect to Couchbase and handle all this data with this kind of tool but we are not 100% sure this is the most accurate way to solve it in Couchbase.
Is there another way to handle this kind of query on huge buckets as i am describing above?
Could it be possible to setup a couple of N1QL secondary indexes on billions document’s bucket without stressing out Couchbase engines and everything continuing working well?