Is there a way to configure XDCR to be more fault tolerant?

Based on my understanding of the documentation, if a remote node that a cluster is replicating to goes down, XDCR will attempt to retry replication until the connection to the remote node has been re-established. This creates a single point of failure that I would like to avoid.

If I were to set up two remote cluster references pointing at two separate nodes in a remote cluster, it would remove that single point of failure, but then would that result in double the replication traffic? I can’t find any reference to any scenario like that in the documentation and I was wondering if you fine folks could help, correct or at least point me in the right direction?

Read over the XDCR documentation again and I think your fears about a single point of failure will be quashed.

Yes the initial ip/hostname of the XDCR needs to be a specific machine inside the target cluster, but once port 8091 from the target cluster is reach it will report back the full range of Ip/hostname of the cluster. The Source cluster will then spread out the XDCR action across the target cluster.

http://www.couchbase.com/docs/couchbase-manual-2.1.0/couchbase-admin-tasks-xdcr-configuration.html