[issue,reproducible] GSI problem: high after-load CPU utilization

Hello!
4.1.2-6033 (undersized, 3 nodes, 2CPU x 2GB RAM).
Very strange situation: after load (my test suite) is gone CB-servers CPU usage (observing via htop) remains very high comparing to “default utilization after start”. This is seems to be an issue, because when i restart a particular server, this server’s CPU usage goes back to normal. See screenshot; on the left upper corner there is a server that was restarted two others were not). You can wait for 10,15,20 mins, but CPU load remains high for non-restarted nodes.


I’m working to reproduce this on 4.1.0-EE for 4x4 with some simple scenario.

  1. 4.1.0-EE GA (5005), 3 nodes (4CPU / 4G RAM)
  2. All nodes have all services enabled (data,index,query)
  3. Establish a cluster with “default” bucket (256Mb, full eviction, no password, 1 replica, view index replicas, flush=enable)
  4. Now, establish ssh connection to all nodes, run htop, and just watch for a while, to ensure, that total observed CPU utilization load is ~ [0…5]% for each node
  5. Run the following code (this will take a while, ) or write your own:
package highcpuafterload;
import com.couchbase.client.java.Bucket;
import com.couchbase.client.java.Cluster;
import com.couchbase.client.java.CouchbaseCluster;
import com.couchbase.client.java.document.JsonDocument;
import com.couchbase.client.java.document.json.JsonObject;
import com.couchbase.client.java.env.CouchbaseEnvironment;
import com.couchbase.client.java.env.DefaultCouchbaseEnvironment;
import com.couchbase.client.java.query.N1qlQuery;
import java.util.LinkedList;
import java.util.concurrent.Phaser;
public class BombardaMaxima extends Thread {
private final int tid;

// configure here
private static final int threads = 20;
private static final int docsPerThread = 50000;
private static final int docTTLms = 86400 * 1000;
private static final int dumpToConsoleEachNDocs = 1000;

private static final Phaser phaser = new Phaser(threads + 1);
private static final CouchbaseEnvironment ce;
private static final Cluster cluster;
private static final String bucket = "default";

static {
    ce = DefaultCouchbaseEnvironment.create();
    final LinkedList<String> nodes = new LinkedList();
    nodes.add("A.node");
    nodes.add("B.node");
    nodes.add("C.node");
    cluster = CouchbaseCluster.create(ce, nodes);
    final Bucket b = cluster.openBucket(bucket);
    // three indexes for query
    final String iQA = "CREATE INDEX iQA ON `default`(a, b) WHERE a is valued USING GSI";
    final String iQB = "CREATE INDEX iQB ON `default`(a, b) WHERE a is valued USING GSI";
    final String iQC = "CREATE INDEX iQC ON `default`(a, b) WHERE a is valued USING GSI";
    final String iQX = "CREATE INDEX iQX ON `default`(a, c) WHERE a is valued USING GSI";
    final String iQY = "CREATE INDEX iQY ON `default`(a, c) WHERE a is valued USING GSI";
    final String iQZ = "CREATE INDEX iQZ ON `default`(a, c) WHERE a is valued USING GSI";
    
    b.query(N1qlQuery.simple(iQA));        
    b.query(N1qlQuery.simple(iQB));        
    b.query(N1qlQuery.simple(iQC));        
    b.query(N1qlQuery.simple(iQX));        
    b.query(N1qlQuery.simple(iQY));        
    b.query(N1qlQuery.simple(iQZ));        
}
public BombardaMaxima(final int tid) {
    this.tid = tid;
}
public final void run() {
    try {
        Bucket b = null;
        synchronized(cluster) { b = cluster.openBucket(bucket); }
        final long stm = System.currentTimeMillis();
        final JsonObject jo = JsonObject
                .empty()
                .put("a", stm)
                .put("b", stm)
                .put("c", stm);
        for(int i = 0; i< docsPerThread; i++) { 
            b.upsert(JsonDocument.create(
                    tid + ":" + System.currentTimeMillis(), 
                    (int)((System.currentTimeMillis() + docTTLms) / 1000),
                    jo)
            );
            if (i % dumpToConsoleEachNDocs == 0) System.out.println("T[" + tid + "] = " + i);
        }
    } catch(final Exception e) {
        e.printStackTrace();
    } finally {
        phaser.arriveAndAwaitAdvance();
    }   
}
public static void main(String[] args) {
    for(int i = 0; i< threads; i++) new BombardaMaxima(i).start();
    phaser.arriveAndAwaitAdvance();
    System.out.println("DONE");
}  
}

6 . Wait code run to complete (you can also wait 10, 15, 20 minutes more after compete, if you wish)
7 . Now watch htop again: you’ll see, that CPU utilization is ~ [15…40]% for all nodes:


8 . now restart one of nodes, wait for a while for “node initialization to pass”, and, watch for htop once again (this node load is [0…5]% now):

9 . restart second node, and …:

10 . now, restart the last one:

Seems like something is stuck within couchbase server.

UPDATE 0: between runs below bucket is deleted and recreated + all servers are restarted
UPDATE 1: No GSI = No problem. With removed (from code) GSI creation, there is no problem at all. So, the problem is defenetely with GSI’s.
UPDATE 2: There is no problem with only one index (iQA, for example)
UPDATE 3: There is a strange effect with 3 indexes (iQA, iQB, iQC):

  1. first run: only one node is affected with “post-load-high-cpu-utilization”.
  2. second run: all three nodes are affected
  3. third run: no nodes affected at all

@keshav_m, @johan_larson, @geraldss,
for your attention: https://issues.couchbase.com/browse/MB-19714