100% CPU Utilization rendering the whole server unresponsive

Dear all,

We’ve got a newly installed server on Google Cloud to serve APIs from a mobile app not released yet so there’s virtually no traffic.

But every once in a while Couchbase would crash the server with 100% CPU utilization.

  • Google Cloud console reports constant 15% - 20% CPU utilization while it sits idle
  • every few days, it suddenly shoots up to 100% CPU utilization and the server becomes totally unresponsive, even ssh wouldn’t work. (though nobody started any intensive work on it)
  • then we had to reset the server from Google Cloud console
  • upon restart, Couchbase might have the buckets in pending mode and it might take a long time (e.g. half an hour) for the buckets to go live again.
  • we might, sometimes, restart the server a second time and Couchbase might come up faster.

Resolutions tried:

  • we originally had a server instance with 4GB ram and got this problem, so we then started new server instances of 7.5GB ram but still no luck.
  • we tried creating new server instances and reinstalling the whole server stack, the same problem still comes up periodically.
  • we’re already running the latest Debian 8 image, the latest Couchbase server community edition.

We’d really appreciate if somebody can advise.

Regards,
Eugene.


Technical info:

Google Cloud server:

  • Server Instance: n1-standard-2 (2 vCPUs, 7.5 GB memory)
  • OS: Debian GNU/Linux 8.9 (jessie)

Couchbase cluster console info:

Server Node:

  • RAM Usage: 34%
  • CPU Usage: 9% - 19% (sometimes up to 60%)
  • Data/Disk Usage: 1.96GB / 2.62GB
  • Items (Active / Replica): 696 K/ 0

Dynamic RAM:

  • Couchbase Quota (5.35 GB)
  • Total (7.32 GB)
  • In Use by Buckets (673 MB)
  • Other Data (5.52 GB)
  • Free (1.14 GB)

Disk One:

  • Total (39.2 GB)
  • In Use by Buckets (2.62 GB)
  • Other Data (11.1 GB)
  • Free (25.5 GB)