Admin console becomes extremely slow under heavy (?) load

Hi,

I just noticed that admin ui becomes unusable when we have a lot of connections. I’m not exactly sure if 8000 connections are considered as heavy load though. I had no problem when we were inserting 150k / second.

Some of the symptoms

  1. Browser displays “waiting for request…” and simply hangs.
  2. Keep getting “failed xhr requests”

Is there a way to fix it?

What kind of “connections”? Are you referring to 8000 application servers talking to the cluster? Can you describe the environment where you’re observing this?

The “failed xhr requests” normally go to the cluster manager, which runs in a separate process, an erlang VM which is a beam.smp process on Linux. You may want to double check your sizing and look for normal resource exhaustion kinds of items: Is there a runqueue on CPU? Is there memory pressure causing paging?

While it’s pretty important to understand and resolve this since the root cause could ultimately have other impact, the good news is that Couchbase is built such that the data services run independent of cluster management.

Hello,

8000 is # of connections under “SERVER RESOURCES” tab when I a bucket. It almost looks like that # does not matter that much. I now have 9500 connections and…it feels very fast.

I haven’t looked at CPU, but we have plenty memory.

Yup, even when it feels slow, performance was not impacted. I’m not exactly sure what caused it at this point.

I will check on CPU when it happens again.

Thanks,
Moon

As the hover-over mostly indicates, that’s Data (KV) service connections. I don’t recall the limit, but it’s in the 10s of thousands and scales well.

That said, if you see this number always growing, it may mean you’re leaking connections/objects somewhere. It’d usually stabilize somewhere around the number of client processes accessing the cluster in the typical environment. This, of course, varies based on deployment platform. 8000 sounds a bit high for small deployments.

I know you’ve posted about PHP before. If this is a PHP deployment, keep in mind there is one connection per process so you’ll not want to run tens of thousands of processes in most cases. There are some other options if required, but it’s a bit of an anti-pattern to, for instance, set Apache servers to 5000 per app server node.

It is expected to have high # at this time since we are running a migration script. We are using PHP-FPM. I’ve heard that PHP-FPM uses an existing connection whenever possible, but I could not find a way to verify it.