- 4.1.0-EE GA (5005), 3 nodes (4CPU / 4G RAM)
- All nodes have all services enabled (data,index,query)
- Establish a cluster with “default” bucket (256Mb, full eviction, no password, 1 replica, view index replicas, flush=enable)
- Now, establish ssh connection to all nodes, run htop, and just watch for a while, to ensure, that total observed CPU utilization load is ~ [0…5]% for each node
- Run the following code (this will take a while, ) or write your own:
package highcpuafterload;
import com.couchbase.client.java.Bucket;
import com.couchbase.client.java.Cluster;
import com.couchbase.client.java.CouchbaseCluster;
import com.couchbase.client.java.document.JsonDocument;
import com.couchbase.client.java.document.json.JsonObject;
import com.couchbase.client.java.env.CouchbaseEnvironment;
import com.couchbase.client.java.env.DefaultCouchbaseEnvironment;
import com.couchbase.client.java.query.N1qlQuery;
import java.util.LinkedList;
import java.util.concurrent.Phaser;
public class BombardaMaxima extends Thread {
private final int tid;
// configure here
private static final int threads = 20;
private static final int docsPerThread = 50000;
private static final int docTTLms = 86400 * 1000;
private static final int dumpToConsoleEachNDocs = 1000;
private static final Phaser phaser = new Phaser(threads + 1);
private static final CouchbaseEnvironment ce;
private static final Cluster cluster;
private static final String bucket = "default";
static {
ce = DefaultCouchbaseEnvironment.create();
final LinkedList<String> nodes = new LinkedList();
nodes.add("A.node");
nodes.add("B.node");
nodes.add("C.node");
cluster = CouchbaseCluster.create(ce, nodes);
final Bucket b = cluster.openBucket(bucket);
// three indexes for query
final String iQA = "CREATE INDEX iQA ON `default`(a, b) WHERE a is valued USING GSI";
final String iQB = "CREATE INDEX iQB ON `default`(a, b) WHERE a is valued USING GSI";
final String iQC = "CREATE INDEX iQC ON `default`(a, b) WHERE a is valued USING GSI";
final String iQX = "CREATE INDEX iQX ON `default`(a, c) WHERE a is valued USING GSI";
final String iQY = "CREATE INDEX iQY ON `default`(a, c) WHERE a is valued USING GSI";
final String iQZ = "CREATE INDEX iQZ ON `default`(a, c) WHERE a is valued USING GSI";
b.query(N1qlQuery.simple(iQA));
b.query(N1qlQuery.simple(iQB));
b.query(N1qlQuery.simple(iQC));
b.query(N1qlQuery.simple(iQX));
b.query(N1qlQuery.simple(iQY));
b.query(N1qlQuery.simple(iQZ));
}
public BombardaMaxima(final int tid) {
this.tid = tid;
}
public final void run() {
try {
Bucket b = null;
synchronized(cluster) { b = cluster.openBucket(bucket); }
final long stm = System.currentTimeMillis();
final JsonObject jo = JsonObject
.empty()
.put("a", stm)
.put("b", stm)
.put("c", stm);
for(int i = 0; i< docsPerThread; i++) {
b.upsert(JsonDocument.create(
tid + ":" + System.currentTimeMillis(),
(int)((System.currentTimeMillis() + docTTLms) / 1000),
jo)
);
if (i % dumpToConsoleEachNDocs == 0) System.out.println("T[" + tid + "] = " + i);
}
} catch(final Exception e) {
e.printStackTrace();
} finally {
phaser.arriveAndAwaitAdvance();
}
}
public static void main(String[] args) {
for(int i = 0; i< threads; i++) new BombardaMaxima(i).start();
phaser.arriveAndAwaitAdvance();
System.out.println("DONE");
}
}
6 . Wait code run to complete (you can also wait 10, 15, 20 minutes more after compete, if you wish)
7 . Now watch htop again: you’ll see, that CPU utilization is ~ [15…40]% for all nodes:
8 . now restart one of nodes, wait for a while for “node initialization to pass”, and, watch for htop once again (this node load is [0…5]% now):
9 . restart second node, and …:
10 . now, restart the last one:
Seems like something is stuck within couchbase server.
UPDATE 0: between runs below bucket is deleted and recreated + all servers are restarted
UPDATE 1: No GSI = No problem. With removed (from code) GSI creation, there is no problem at all. So, the problem is defenetely with GSI’s.
UPDATE 2: There is no problem with only one index (iQA, for example)
UPDATE 3: There is a strange effect with 3 indexes (iQA, iQB, iQC):
- first run: only one node is affected with “post-load-high-cpu-utilization”.
- second run: all three nodes are affected
- third run: no nodes affected at all