I’m trying to bring up a 3 node cluster on AWS. I am using the commercial CB v2.2.0 Enterprise Standard AWS AMI images authored by CB. These are being instantiated on m1.xlarge instances using local ephemeral storage. The data and indices are on separate local spindles. Replicas are being maintained. In other words, this is a pretty vanilla cluster that I’m trying to bring into a production validation process. There is a fourth node, the Python client SDK node running on top of Ubuntu 13.10. I have top running on it and it seems to maintain good connectivity back to my office. The client process is taking, disappointingly, less than 50% of a single core. (On my local test infrastructure, on smaller machines, the local client maxes out a single core.)
Everything appears to be operating correctly … except when it doesn’t.
The cluster successfully took a 33 million document/30 GB cbrestore. When I try to use the same script that created the aforementioned DB and add new documents, one of the three nodes flaps into and out of the cluster. While this is a great test of my idempotent document insertion strategy, it is an extreme PITA. My threaded client handles the problems pretty well but the timeouts really hit insertion performance.
As an AWS newbie, my questions are:
How should I have launched these images to minimize cluster flap? Is there a command or zone I can specify in the launch process? (All three nodes are in the same availability zone.)
Is there a better image to use? I am following the advice from Amazon in their paper describing how to use CB on their cloud.
It has kind of repaired itself but I’m still seeing flap “glitches” in the console display. Where would I look in the logs to see some more detail as to what is going on?
Thank you in advance for any insight you care to share.