I would like to read a significant portion of Couchbase by Spark. What I would like to achieve is to do the read in a reasonable way - instead of millions of random reads I would like to control which vBucket/Server/file I’m reading the data from. I know there is a Spark connector but there are many complaints about lack of control like that which causes bad performance.
So my question is - does Couchbase allow for operations like:
- read entire vBucket (preferable since I believe that the entire vbucket is not only on the same server but also in the single place storage-wise).
- any other way of using indexes which will allow efficient batch reads (i.e. clustered index in relational databases allows to read data per range in an effective way).