In the Sync Gateway documentation it recommends Quad-core/4GB RAM machine to support up to 5K concurrent users. Our machine is hosting Sync Gateway and Couchbase with Quad-core/7GB of RAM, each core being about 2.0GHz speed. Our total user base is only ~5K, and they are definitely not all using our service at the same time, but we are having serious performance issues and I can’t tell if the issue is with sync_gateway or couchbase or both. It is taking 30 seconds or more to respond to an all_docs request for a user that only has access to ~170 documents. I can see by running lsof in the terminal that sync_gateway has about 4000 connections, but most are not in the ESTABLISHED state:
Can’t Identify Protocol: 2602
We are using nginx as a reverse proxy so these connections are from sync gateway to nginx. I’m very new to server administration, but my limited research on the subject implies that this means sync gateway is leaking sockets and not closing them correctly. Is this just the way sync gateway handles connections, and by ‘concurrent’ they mean some sort of time range in which a user has made a request? E.g. they hold open a connection for a few hours before fully closing it or something? Is there anyway I can fix this so sync gateway closes connections sooner?
UPDATE: Found out that we were using longpoll in our apps, so setting the keep alive and proxy timeouts to 360s solved the issue of a lot of CLOSE_WAIT and "can’t identify protocol’ connections. But it also means that if your app uses continuous polling that concurrent users is the same as total users, and your server will always have a very heavy load in terms of TCP connections.