4.1.0-EE vs 4.1.1-EE: indexer too slow

@vmx,
great job!
Thanks.

@vmx,
there is something strange with results using view (4.1.1-EE): even if i do “pause” for “long enough settle all delays with indexing” (for testing purposes it is 50 seconds now), it seems like (from time to time, it may be 3,4,5 …N-th tests-cycle) view returns not all results. more deeper checks needed (probably, this is my tests mistakes), but i would like to clarify: can https://issues.couchbase.com/browse/MB-19503, theoretically, cause view to return “incomplete set of results” (i.e. not all that was upserted and should be indexed)?
[UPDATE] no, it’s more like my tests problem

@cihangirb, @vmx
could you please explain one thing for me: as i see https://issues.couchbase.com/browse/MB-19503 has mark “Fix Version/s: watson”. Does it mean, that there will be no fix of this bug for 4.1.X releases (4.1.2 etc.)? And If it is so, is there a way to ask respected developers, who are (i hope) going to find a solution for this bug, also to include 4.1.X branch as “Fix Version/s” ?

@grep, that’s right, “Watson” is the current target. We might backport a fix but there’s no guarantee (we can’t backport all fixes to all releases). Best is if you leave a comment on the issue (your forum credentials should work on Jira as well), so that everyone is aware that you’d like to have a fix for 4.1.x.

Hello, it’s @grep, but i’am a @egrep (temporary, i hope, because of Help needed: strange login behavoir / can't login to forum)

So, what do i see for Loading... ?

  1. Closed
  2. Won’t Do

I would like to post the following to JIRA, but @egrep credentials are not allowed to login, so…

Jim Walker has explained the problem with “undersizing” (lack of cpu). Let’s take a look at the explanation “core” (from my point of view):

There are times where we begin a DCP task, get data ready to send, but many of the other background tasks run before the frontend thread gets its chance to run and send the data, hence why sometimes you get pauses on DCP.

I think, there is a “logical flaw” in this statement:

  1. Nothing (i suppose) will prevent the same situation for N-cpu system. When i try (imaginary) to scale this situation to N cpu, i see no differences at all: DCP/FE-task has the same chance “to lost it’s fight for CPU with lots of background tasks” in N-cpu system (if number of background tasks increases), because OS-task-scheduling algorithm is the same. (I should mention here, that i don’t exactly know how OS-task-scheduling algorithm works for N-cpus and this is a “logical flaw” in my position)
  2. This problem, from my point of view, is a QoS-problem: some tasks within CB must have “high priority” for execution. “Building complex solution for prioritization of CB-tasks” is a real way to solve described problem

Little bit of emotion:

Hope is not a strategy.

  • Traditional SRE saying (cite from SRE: HOW GOOGLE RUNS PRODUCTION SYSTEMS)

Should we really hope that “For N cpus the problem is going to be resolved by OS-kernel-scheduler?”.

ok, no more emotions
:wink:

So, i think, the problem exists, and “all versions of Couchbase are affected under certain circumstances”. Lack of CPUs just allows to emulate such circumstances more easily. But, imho, this is more like “complex architecture problem”. I assume, that there were no need to think about CB-tasks QoS before (am i right ?) . And probably, there is no need now because of “minimal requirements” of 4 CPU hide the problem: QoS, by definition, is needed when “there is an unmanaged low-level concurrency that does not allow to solve high-level problems” (for example: well-known voice traffic problems with jitter without QoS) and there is no such big concurrency caused by same load for 4 CPUs as for 1 CPU (or 2 CPUs, because 2 it not enough too - problem still shows itself).

I would like to thank @vmx, who discovered 2 things for me (with adjustment, that @vmx is not a developer of that part of server):

  1. The frontend threads got priority over DCP (still thinking, am i right in assumption of lack of CB-tasks QoS ? maybe, there is something like pre-QoS-implementation?)
  2. It is better to let DCP block than other operations

Ok, i got it. But maybe now it’s an occasion to think about backpressure implementation (as one on steps) to make “system without pauses caused by overloading” ? (sorry, yea, i understand that it is much more easier to give an advice, than “to sit and implement”)

And finally, about “bad luck for 4.1.X users”. Jim made a good assumption (without proofs, but my knowledge about “how it works” makes me to agree), that synchronization primitives change caused less of “forced rescheduling” that had “helped” versions prior to 4.1.1 to work fine. But those changes (especially using of cmpxchg instead of mutexes) brought much more performance (google about this), so it was the right way, i think.

So:

  1. Seems like there will be no fixes for 4.1.X, that could help “undersizers”; solution is to buy more CPUs or use 4.1.0 (with all unresolved issues) :frowning:
  2. Problem still exists (?)

:pensive:

More tests once again
[4.1.2, source-based]:

  1. 2 CPU / 1GB RAM => my tests failed with the same problem ~ 2:3 (failed:ok)
  2. 3 CPU / 1GB RAM => my tests failed with the same problem ~ 1:3 (failed:ok)

4.1.1-EE - with same configurations - little bit later

ok, probably, i’m still “undersized”, but 3 CPUS, Carl !
I really think that this problem should be revised and explored more thoroughly.

Hurray! Hurray!
https://issues.couchbase.com/browse/MB-19503

From time to time it’s good to be stubborn :slight_smile:

Happy to help :smile:

@jwalker,
thanks for adding 4.1.2 as “Fix version(s)”!
P.S. and thanks to management team too :wink:

@jwalker, @drigby, @vmx, @manu

http://review.couchbase.org/#/c/64025/

Guys, it was BAD for 4.1.1 and 4.1.2-6026 (tens of seconds;), but for 4.1.2-6027 it’s TERRIBLE (several minutes)!
“Sitting and watching” views page on UI: it takes about 10 mintues for “Indexing” 316 items!
[UPDATE] Configuration is the 4VCPU + 4G RAM (it’s been “increased” from default 1+1, and, for my luck, i forgot to reduce)

Wanna me to upload video, or my word is enough ?

repo info:

Project: ep-engine
Mount path: /home/grep/couchbase-4.1.2-6027/src/ep-engine
Current revision: 0856e0b3d3fc62a50677a9be7963be3d5c04d041
Local Branches: 0

Can you paste the link to the exact build you’ve downloaded (6026 and 6027)? I’ll take a look soon.

Also can you summarise in simple steps, what you do to create the problem? Including how many cluster nodes you use.

@jwalker,
now its:

  1. 3 nodes (4G RAM x 4 CPU)
  2. Target bucket = 128Mb (x3 by node numbers), auth = enabled, 1 replica, “view index replicas”, low priority, full eviction, default auto-compaction, flush = true
  3. updateInterval = 1000, updateMinChanges = 1, replicaUpdateMinChanges = 1
  4. I use 6027 (https://github.com/couchbase/build-team-manifests/blob/master/sherlock.xml)
  5. Create 1 DD with 1 view for bucket (actually, i have 2 DD’s, second one is with 2 views, but this one is mainly used [simplified]):

function (doc, meta) { if(meta.id.charAt(5) == ‘x’) emit(meta.id.substring(10)); }

6.Insert docs (with any content, they could be even empty) and enjoy “indexing” time

UPDATE 1: index time is independent of priority (low/high) and eviction mode (full/value)

@jwalker, @drigby, @manu, @vmx

ok, here is the simple way to reproduce:

  1. Take Version: 4.1.2-6027 Enterprise Edition (build-6027) [md5 for Ubuntu 14.04 .deb on amd64 = f9dcd03b68059edb6bbb97079cfc2777]
  2. Install on 3 nodes (4VCPU, 4GB RAM) and establish a cluster [if you wish, use 1CPU + 1GB, no matter]
  3. Use default bucket (value ejection, 1 replica, view index replicas, priority = high, flush = enable)
  4. Change UpdateInterval, using the line below (change $HOST,$PORT, $PASSWORD as needed):
curl -X POST -u Administrator:$PASSWORD --data 'updateInterval=1000&updateMinChanges=1&replicaUpdateMinChanges=1' http://$HOST:$PORT/settings/viewUpdateDaemon

5 . Run following code (java) or write your own, using favorite SDK:

package indexertooslow;

import com.couchbase.client.java.Bucket;
import com.couchbase.client.java.Cluster;
import com.couchbase.client.java.CouchbaseCluster;
import com.couchbase.client.java.document.JsonDocument;
import com.couchbase.client.java.env.CouchbaseEnvironment;
import com.couchbase.client.java.env.DefaultCouchbaseEnvironment;
import com.couchbase.client.java.view.DefaultView;
import com.couchbase.client.java.view.DesignDocument;
import java.util.Arrays;
import java.util.LinkedList;

public class IndexerTooSlow {
public static void main(String[] args) {
    CouchbaseEnvironment ce = DefaultCouchbaseEnvironment.create();
    final LinkedList<String> nodes = new LinkedList();
    nodes.add("A.node");
    nodes.add("B.node");
    nodes.add("C.node");
    Cluster cluster = CouchbaseCluster.create(ce, nodes);
    DesignDocument dd = DesignDocument.create(
        "dd",
        Arrays.asList(
  DefaultView.create(
                    "view",
  	"function (doc, meta) { if(meta.id.charAt(5) == 'X') emit(meta.id.substring(10)); }"
            )
        )
    );
    dd.options().put(DesignDocument.Option.UPDATE_MIN_CHANGES, 1L);
    dd.options().put(DesignDocument.Option.REPLICA_UPDATE_MIN_CHANGES, 1L);
    String bucketName = "default";
    Bucket b = cluster.openBucket(bucketName);
    b.bucketManager().upsertDesignDocument(dd);
    for(int i = 0; i< 1000; i++) {
        System.out.println(i);
        b.async()
                .upsert(JsonDocument.create("XXXXXXXXXXXXX" + String.valueOf(System.currentTimeMillis())))
                .toBlocking()
                .subscribe();
    }
}

}

6 . It will take ~ 10 minutes to index (after end of run), relax and enjoy :wink:

@jwalker,

Confirm, this one works fine (at least, for my tests and manually-edited build of 4.1.2-6027):

while (!queue.empty()) {
    connection_t &conn = queue.front();
    Notifiable *tp = dynamic_cast<Notifiable*>(conn.get());
    tp->setNotificationScheduled(false);
    if (tp && tp->isPaused() && conn->isReserved()) {
        engine.notifyIOComplete(conn->getCookie(), ENGINE_SUCCESS);
        tp->setNotifySent(true);
    }
    queue.pop();
}

A second update (similar to the one I sent to you and you tried) has been submitted for inclusion in our upcoming releases.

@jwalker,
UPDATED: see post below, this one was a mistake.

@jwalker,
Sorry, sorry, my fault! Forgot (yea, night is a time to sleep, not to test) to set “updateInterval=1000” , that is why my tests were failed for the first time with v2 of patch. Setting “updateInterval=1000” makes tests pass.
After 10+ cycle of my app’s testing, confirm, that this code also works fine:

while (!queue.empty()) {
    connection_t &conn = queue.front();
    Notifiable *tp = dynamic_cast<Notifiable*>(conn.get());
    if (tp) {
        tp->setNotificationScheduled(false);
        if (tp->isPaused() && conn->isReserved()) {
            engine.notifyIOComplete(conn->getCookie(), ENGINE_SUCCESS);
            tp->setNotifySent(true);
        }
    }
    queue.pop();
}

But strange thing is that with default “updateInterval=5000” “visually” indexing looks slower.

@jwalker,
some remarks: build 6028 works fine, but with undersized configuration a can’t get more then 15-20 times of “my app cyclic test runs” with “full ejection”. My top result is 138 times [4.1.0-GA, undersized, value ejection (there is a bug blocking “full ejection” for 4.1.0 if counters are used) ]; failed due to network problem. But for 6028 most of “test cycle failures” reasons are related to indexing (i.e. when i look for failure reason inside particular test, i see, that it is failed because of “delayed indexing”). I suspect there is something wrong with “full eviction”, but more tests needed. And, of course, it is much harder to “prove such errors with simple tests”.
So, all this is like a “big remark”.

Builds >= 6028 work fine.
All problems were my test suite problems.