N1QL replica read for resiliency?

zarandras · June 13, 2016, 4:34pm

Hi all,
As far as I understand, N1QL queries run on the active data nodes and not replicas.
My question is: In the case of a data node failure (but before the failover-rebalance process),
is it possible to run queries from Java SDK so that the query processes data from replica vBuckets for substituting unreachable active vBuckets? This way the application could still answer queries with full functionality (although data may partially be stale). The replica read function works for explicit keys only.

simonbasle · June 13, 2016, 4:36pm

No, that’s not possible as the N1QL queries depend on the index service and do their own data fetching on the server side, contrary to the K/V operations that are directly targeted at a given node by the SDK.

zarandras · June 13, 2016, 4:45pm

Thanks for the quick answer.

zarandras · June 15, 2016, 12:01pm

Another, related question: What about View queries? Are they possible to be set up to read from replicas? We would like our system to be HA in terms of continuous operation (at least, for reads) even if one of the nodes fail but its data is replicated.

egrep · June 15, 2016, 12:19pm

@zarandras,
http://developer.couchbase.com/documentation/server/current/indexes/mapreduce-view-replication.html
And “view index replica” option for bucket, but, of course, be aware of possible stale replicas.

zarandras · June 17, 2016, 1:16pm

Thank you @egrep. What is not clear to me, neither from the ducumentation, nor by trying out, how can I issue a view query that reads actually from the replicas? I have a bucket with index replicas on but if I shut down one of the data nodes I just get partial answers, see here: View-based primary index and node failure - it seems CB does not read actually from the replica index parts instead of unavailable active data nodes.

There is a sentence in the documentation which is not quite clear to me: “By providing replica indexes the server enables you to perform queries even in the event of node failure.” Does it mean if a node is temporarily unavailable, I can read from the replicas? (And if so, how? - automatically, or by a parameter setting like stale?) Or does it mean just simply: if I make a hard failover, the replica index becomes the active index and I can query immediately without waiting for an index to be newly built?

egrep · June 17, 2016, 1:35pm

@zarandras,
I’m not sure (i’m just a user like you and i really thought that “view query” works out-of-the-box with replicas in case of one of nodes failure), but i suppose that in case of view you should use “stale” == true for view query (imho, view-index-part based on local-stored replicas should be considered “stale” by default).

But to be sure, it’s better to ask @simonbasle to clarify this.

P.S. For “simple get” you can use: (Couchbase SDKs):

bucket
  .async()
  .get("id")
  .onErrorResumeNext(bucket.async().getFromReplica("id", ReplicaMode.ALL))
  .subscribe();

Actually Bucket.getFromReplica() is all you need in this case.

simonbasle · June 17, 2016, 2:42pm

@zarandras As far as I know, there is no replica-read support in N1QL at all, even when based on view indexes.

For pure map-reduce views usage, it is my understanding that what the documentation @egrep linked to says is “each node also indexes the data it replicates, so that when you do a failover, the index is ready to go immediately instead of having to be rebuilt” (second proposition). That is, when the correct bucket setting has been activated (index replicas).

egrep · June 17, 2016, 2:47pm

@simonbasle,
i think the question is “would pure map-reduce views work correct right after node fails but before failover happens if stale=true is used” ?

geraldss · June 17, 2016, 2:51pm

If you use covering indexes with N1QL, you don’t have to worry about replica reads. All the data comes from the indexes, and the indexes can be duplicated for redundancy.

zarandras · July 28, 2016, 1:26pm

For specific queries it works fine with covering indexes. Thx @geraldss .

@simonbasle We are facing now a related, more general question: how to serve full-document data reads after a node failure and before failover? [Even if the data being read is stale. And especially when an auto-failover is processed and another node falls out before rebalance, the operator must intervene and if there are many nodes, the probablitiy of such an issue is higher and the time for manual correction is unpredictable - it seems we cannot realize a “self-healing” cluster for such cases.]

Since the vBucket table contains where the replicas are, it seems to be possible - at least in principle - to get data from a replica after the active vBucket times out, but the query/index service always want to read from the active (fallen-out) data node. The question is: is there a planned feature like this or related? Or what do you suggest for such a case? Is it the best (or only) solution to duplicate the cluster and use XDCR and if we get a timeout for the primary cluster, we redirect the query to the other cluster?

simonbasle · July 28, 2016, 1:42pm

No, before a failover the SDK can’t do anything for the query and view services, since they don’t cover the case of going to unpromoted replicas for data.

Since the SDK directly accesses data nodes for key/value operations, it is able to offer a method that will instead ask a replica node and do a replica read for this category of operations.

zoltan.zvara · July 28, 2016, 3:27pm

Would you be able to cover not only key-value operations, by replicating to another cluster using XDCR and re-route client queries to the other cluster if a request hangs for too long? Let’s stay the auto-failover has been set to 30 seconds, but the client would like to take an alternative after 2-3 seconds.

simonbasle · July 28, 2016, 4:10pm

That could be an approach, but it implies double the resources (sockets, etc…) in the SDK and more book-keeping/error detection/management in your code. If you want to only do idempotent queries (views or N1QL selects) from the backup cluster it might fit your need though.

Writes to backup cluster means bidi XDCR, and is probably much harder to get right.