Haven't heard from a higher priority node or a master, so I'm taking over


#1

I have couchbase server community edition v3.0.1 cluster running. I keep getting this error in logs too many times and the cluster views query performance goes down like it takes too much time for views to return data. What can be the reason for this? any known workaround to fix it?


#2

Looks like there is something wrong in how your nodes communicate with each other. How does your cluster topology look like? Are all green on the nodes list? Can they all talk to each other on all the ports that need to be open?


#3

There are 3 nodes in total hosted on Ubuntu 14.04 VMs at Azure. All are located in same private network. All nodes are ‘green’ in the list.


#4

Does the msgs in logs in any way correspond with peak loads where nodes could be too busy to respond? which azure SKUs are you running on? I wonder if the issue is related to sizing where the nodes get too busy to respond as cluster manager is starved and can’t get cpu time to perform the heartbeat calls.
thanks
-cihan


#5

All nodes are on Azure with DS4 (8 core, 28GB RAM) + 1TB SSD configuration. There are documents that are getting updated with average of 50ops and 5 production views which keep indexing due to the documents getting updated. apart from that there is no load on the cluster.


#6

forgot to mention one thing that the cluster is having XDCR configured for two other clusters. As soon as the XDCR replication was paused, all views started working really fast and returned data instantaneously. Any suggestions for lowering the XDCR overhead?


#7

Thanks. I will give you a few options to consider as solutions;
1- Do you have a lot of data/mutations you are synchronizing through XDCR? at the beginning XDCR works hard to catch up clusters so it may go through an intense period of sync but if the ongoing mutation rate is low the issue may disappear after.
2- I’d recommend using 4.0. We have done work in XDCR in 4.0 that fundamentally changed how XDCR is done. Is that an option for you?
3- You can try to dial down XDCR resources by lowering the XDCR Max Replications per Bucket. half the value and see if you can notice a difference.
thanks


#8

Thanks for the reply. the clusters are synced continuously and some documents are updated frequently. I have tried with XDCR max replications per bucket set to 8 and it did have a noticeable difference but still the views take around 10 sec to respond. Waiting for v4.0.1 release since the current release too has some issues with XDCR with password encoding.


#9

FYI; we have a preview of the 4.1.0 release on the downloads page.
http://www.couchbase.com/nosql-databases/downloads
thanks
-cihan