We are currently running a 3 node cluster using 4.0.0-4051 Community Edition on Amazon Web Services. Each node runs in a different availability zone, and has been relatively stable for some time.
Several days ago we began an effort to shift to a new instance type with more RAM. We started a new cluster with three additional instances and established a one-way XDCR replication between the existing cluster and the new one. Replication seems to have gone well, views were built on the target, etc.
The problem is that we later discovered that inter-zone network traffic for the original cluster (our replication source, NOT the replication target) had shot through the roof. By that I mean we racked up over 10TB (!) in inter-zone traffic charges in a day and a half. Again, this occurred only the original cluster. The new cluster has a fraction of the traffic. The moment we killed XDCR everything went back to normal on the source cluster.
This is not a huge database. The database is relatively stable with just over 1.5M documents and a total size less than 25GB. Also, we’re not turning the data over frequently, and material is primarily added in a gradual fashion (not a high write load on the primary cluster). That excessive traffic had continued at a pretty constant rate even though the replication looked to have caught up long ago.
Any ideas or guidance on troubleshooting or understanding this? I haven’t run into something similar the forums, yet, and am about to start digging on logs. Having said that, I can’t believe this is expected or normal behavior. Thanks in advance for any ideas.