GSI: different sporadic bugs + fragmentation up to 98%

@egrep - compaction runs independently for data, index and views. If you have a topology where indexes having the issue is on a separate node and drive, you may be suffering from a compaction error that isn’t bubbling up on the indexes to get to this state. grep -i for “compact” and see if there is a compaction completion msg in indexer.log. I can look up the exact text if you are having issues.
thanks
-cihan

What log should i exaclty “grep” ?

/opt/couchbase/var/lib/couchbase/logs/indexer.log:2016-05-31T15:31:08.887+00:00 [Info] CompactionDaemon: Checking fragmentation, IndexInstId: 3680848504215299622 Data:3351772, Disk:28356608, ExtraSnapshotData:0, Fragmentation:88.17992617452694%
/opt/couchbase/var/lib/couchbase/logs/indexer.log:2016-05-31T15:31:38.888+00:00 [Info] CompactionDaemon: Checking fragmentation, IndexInstId: 3680848504215299622 Data:3351772, Disk:28356608, ExtraSnapshotData:0, Fragmentation:88.17992617452694%
/opt/couchbase/var/lib/couchbase/logs/indexer.log:2016-05-31T15:32:05.941+00:00 [Info] PeriodicStats = {“default:iQX:avg_scan_latency”:0,“default:iQX:avg_scan_wait_latency”:0,“default:iQX:avg_ts_interval”:200714057,“default:iQX:avg_ts_items_count”:21,“default:iQX:build_progress”:100,“default:iQX:data_size”:3351772,“default:iQX:delete_bytes”:0,“default:iQX:disk_size”:28356608,“default:iQX:flush_queue_size”:0,“default:iQX:frag_percent”:88,“default:iQX:get_bytes”:0,“default:iQX:insert_bytes”:10936,“default:iQX:items_count”:9897,“default:iQX:num_commits”:0,“default:iQX:num_compactions”:0,“default:iQX:num_completed_requests”:0,“default:iQX:num_docs_indexed”:20000,“default:iQX:num_docs_pending”:0,“default:iQX:num_docs_queued”:0,“default:iQX:num_flush_queued”:205,“default:iQX:num_last_snapshot_reply”:0,“default:iQX:num_requests”:0,“default:iQX:num_rows_returned”:0,“default:iQX:num_snapshot_waiters”:0,“default:iQX:num_snapshots”:9,“default:iQX:scan_bytes_read”:0,“default:iQX:scan_wait_duration”:0,“default:iQX:since_last_snapshot”:199612495,“default:iQX:timings/storage_clone_handle”:“0 0 0”,“default:iQX:timings/storage_commit”:“0 0 0”,“default:iQX:timings/storage_del”:“0 0 0”,“default:iQX:timings/storage_get”:“205 3273748 380521959320”,“default:iQX:timings/storage_handle_open”:“1 21790 474804100”,“default:iQX:timings/storage_info”:“19 163719 2898968793”,“default:iQX:timings/storage_iterator_next”:“0 0 0”,“default:iQX:timings/storage_meta_get”:“2 18225118 304963307518090”,“default:iQX:timings/storage_meta_set”:“0 0 0”,“default:iQX:timings/storage_new_iterator”:“0 0 0”,“default:iQX:timings/storage_set”:“206 1992214 28906881938”,“default:iQX:timings/storage_snapshot_close”:“8 304351 15061596145”,“default:iQX:timings/storage_snapshot_create”:“9 1757314 383319197322”,“default:iQX:total_scan_duration”:0,“default:mutation_queue_size”:0,“default:num_mutations_queued”:205,“default:num_nonalign_ts”:0,“default:ts_queue_size”:0,“memory_quota”:268435456,“memory_used”:25554944,“needs_restart”:false,“num_connections”:0,“timings/stats_response”:“6950 2041447151 3277796782181851”}
/opt/couchbase/var/lib/couchbase/logs/indexer.log:2016-05-31T15:32:08.888+00:00 [Info] CompactionDaemon: Checking fragmentation, IndexInstId: 3680848504215299622 Data:3351772, Disk:28356608, ExtraSnapshotData:0, Fragmentation:88.17992617452694%

And more:

/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_compaction_exp_mem_threshold 85
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_compaction_write_queue_cap 10000
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_mlog_compactor_runs 0
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_pending_compactions 0
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_compaction_exp_mem_threshold 85
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_compaction_write_queue_cap 10000
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_mlog_compactor_runs 0
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_pending_compactions 0
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_compaction_exp_mem_threshold 85
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_compaction_write_queue_cap 10000
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_mlog_compactor_runs 0
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_pending_compactions 0
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_compaction_exp_mem_threshold 85
/opt/couchbase/var/lib/couchbase/logs/stats.log:ep_compaction_write_queue_cap 10000

And more:

/opt/couchbase/var/lib/couchbase/logs/projector.log:2016-05-31T14:08:48.546+00:00 [Info] PROJ[:9999] settings indexer.settings.compaction.check_period will updated to 30
/opt/couchbase/var/lib/couchbase/logs/projector.log:2016-05-31T14:08:48.546+00:00 [Info] PROJ[:9999] settings indexer.settings.compaction.interval will updated to 00:00,00:00
/opt/couchbase/var/lib/couchbase/logs/projector.log:2016-05-31T14:08:48.546+00:00 [Info] PROJ[:9999] settings indexer.settings.compaction.min_size will updated to 524288000
/opt/couchbase/var/lib/couchbase/logs/projector.log:2016-05-31T14:08:48.546+00:00 [Info] PROJ[:9999] settings indexer.settings.compaction.min_frag will updated to 30
/opt/couchbase/var/lib/couchbase/logs/projector.log:{“indexer.settings.compaction.interval”:“00:00,00:00”,“indexer.settings.log_level”:“info”,“indexer.settings.persisted_snapshot.interval”:5000,“indexer.settings.compaction.min_frag”:30,“indexer.settings.inmemory_snapshot.interval”:200,“indexer.settings.max_cpu_percent”:400,“indexer.settings.recovery.max_rollbacks”:5,“indexer.settings.memory_quota”:268435456}
/opt/couchbase/var/lib/couchbase/logs/projector.log:2016-05-31T14:20:05.388+00:00 [Info] PROJ[:9999] settings indexer.settings.compaction.interval will updated to 00:00,00:00
/opt/couchbase/var/lib/couchbase/logs/projector.log:2016-05-31T14:20:05.388+00:00 [Info] PROJ[:9999] settings indexer.settings.compaction.min_size will updated to 524288000
/opt/couchbase/var/lib/couchbase/logs/projector.log:2016-05-31T14:20:05.388+00:00 [Info] PROJ[:9999] settings indexer.settings.compaction.check_period will updated to 30
/opt/couchbase/var/lib/couchbase/logs/projector.log:2016-05-31T14:20:05.389+00:00 [Info] PROJ[:9999] settings indexer.settings.compaction.min_frag will updated to 30

And more:

/opt/couchbase/var/lib/couchbase/logs/reports.log: {name,compaction_new_daemon},
/opt/couchbase/var/lib/couchbase/logs/reports.log: {mfargs,{compaction_new_daemon,start_link,}},
/opt/couchbase/var/lib/couchbase/logs/reports.log: {name,compaction_daemon},
/opt/couchbase/var/lib/couchbase/logs/reports.log: {mfargs,{compaction_daemon,start_link,}},
/opt/couchbase/var/lib/couchbase/logs/reports.log: {name,compaction_new_daemon},
/opt/couchbase/var/lib/couchbase/logs/reports.log: {mfargs,{compaction_new_daemon,start_link,}},
/opt/couchbase/var/lib/couchbase/logs/reports.log: {name,compaction_daemon},
/opt/couchbase/var/lib/couchbase/logs/reports.log: {mfargs,{compaction_daemon,start_link,}},
/opt/couchbase/var/lib/couchbase/logs/reports.log: {name,compaction_new_daemon},

And more:

Start compaction of vbuckets for bucket default with config:
/opt/couchbase/var/lib/couchbase/logs/info.log:[ns_server:info,2016-05-31T14:58:05.198Z,ns_1@81.node:<0.28436.0>:compaction_new_daemon:spawn_scheduled_views_compactor:493]Start compaction of indexes for bucket default with config:
/opt/couchbase/var/lib/couchbase/logs/info.log:[ns_server:info,2016-05-31T14:58:34.991Z,ns_1@81.node:<0.28793.0>:compaction_new_daemon:spawn_scheduled_kv_compactor:467]Start compaction of vbuckets for bucket default with config:
/opt/couchbase/var/lib/couchbase/logs/info.log:[ns_server:info,2016-05-31T14:58:35.204Z,ns_1@81.node:<0.28797.0>:compaction_new_daemon:spawn_scheduled_views_compactor:493]Start compaction of indexes for bucket default with config:
/opt/couchbase/var/lib/couchbase/logs/info.log:[ns_server:info,2016-05-31T14:59:04.994Z,ns_1@81.node:<0.29166.0>:compaction_new_daemon:spawn_scheduled_kv_compactor:467]Start compaction of vbuckets for bucket default with config:

And more:

/opt/couchbase/var/lib/couchbase/logs/debug.log:[ns_server:debug,2016-05-31T15:17:04.148Z,ns_1@81.node:compaction_new_daemon<0.402.0>:compaction_scheduler:schedule_next:60]Finished compaction for compact_kv too soon. Next run will be in 30s
/opt/couchbase/var/lib/couchbase/logs/debug.log:[ns_server:debug,2016-05-31T15:17:05.330Z,ns_1@81.node:compaction_new_daemon<0.402.0>:compaction_new_daemon:process_scheduler_message:1251]Starting compaction (compact_views) for the following buckets:
/opt/couchbase/var/lib/couchbase/logs/debug.log:[ns_server:info,2016-05-31T15:17:05.332Z,ns_1@81.node:<0.9632.1>:compaction_new_daemon:spawn_scheduled_views_compactor:493]Start compaction of indexes for bucket default with config:
/opt/couchbase/var/lib/couchbase/logs/debug.log:[ns_server:debug,2016-05-31T15:17:05.332Z,ns_1@81.node:compaction_new_daemon<0.402.0>:compaction_new_daemon:process_compactors_exit:1292]Finished compaction iteration.
/opt/couchbase/var/lib/couchbase/logs/debug.log:[ns_server:debug,2016-05-31T15:17:05.332Z,ns_1@81.node:compaction_new_daemon<0.402.0>:compaction_scheduler:schedule_next:60]Finished compaction for compact_views too soon. Next run will be in 30s

What exactly should i look for ?

2 hours later after “reference repro” of “Bumpy ride” ( https://issues.couchbase.com/browse/MB-19792) compaction settings changed, compaction forced, but nothing happened:

@siri,
for my MB-questions:

/opt/couchbase/etc/couchdb/default.ini

[database_compaction]
; larger buffer sizes can originate smaller files
doc_buffer_size = 524288 ; value in bytes
checkpoint_after = 5242880 ; checkpoint after every N bytes were written

512K and 5M. 5 points * 5M = 25M << 512M.

? maybe, it’s not “in bytes”, but “int kilobytes” ?

[compaction_daemon]
; The delay, in seconds, between each check for which database and view indexes
; need to be compacted.
check_interval = 60
; If a database or view index file is smaller then this value (in bytes),
; compaction will not happen. Very small files always have a very high
; fragmentation therefore it’s not worth to compact them.
min_file_size = 131072

128K ?

Mmmm… is there another configuration file i need to change for compaction configuration? (or maybe there is different way of configuration ?).
Or, the perfect case: there is a guide for configuration. But where ?

UPDATE: so strange: :Couchbase SDKs

4.1.1-EE, see only:

Nothing about “Circular compaction” or “Setting fragmentation level”

hehe, yeah, you wish!
I was asking about possibility to do some simple configuration through a config file, didn’t get any response. There is /opt/couchbase/etc/couchbase/static_config but even that one is not described properly in the documentation.
But I beg my pardon for interrupting… :wink:

@siri, @cihangirb,
is a little wonder in form of “minimal config file explanation” possible for
me, @dmitryb (and, i also think, that @moon0326 with his 1:1168 [ 99% fragmentation on indexes ] index would not object)

Hi folks - busy day, apologies for the delayed response.
Unfortunately we don’t document these config files and I’d not recommend changing them. For the compaction issue, we don’t kick compaction off until the file gets to a certain size for the index (I believe it is set to 500MB). It is hard to follow on the thread on a few occasions but I think what you are seeing that you don’t see compaction kicking off or finishing.
grep for “CompactionDaemon: Compacting index instance:” and “CompactionDaemon: Finished compacting index instance:” - if you are not seeing these we have not kicked off compaction yet.
do you have these in the indexer.log?

@cihangirb

and I’d not recommend changing them.

Something that is dependent on these params is hardcoded ?
Unpredictable side-effects are possible ?

grep -i “CompactionDaemon” /opt/couchbase/var/lib/couchbase/logs/*

shows nothing, seems you’re right, but i’ll generate 1G+ per index to ensure.

@siri, @cihangirb,
there definitely is a problem: index just stops to grow after 1.5G on 4.1.1-EE (5914):



Need a MB / repro ?

UPDATE: changed algorithm for keys and values to be more different. The result is the same: it stucks:




@siri, @cihangirb,
is it enough to make you sure that there is an error and it shoud be investigated on the level of “GSI logs and code inspection” ?

yes - pls open an issue - include the indexer.log so we can take a look. it may take a long time for the compaction to catch up - can you describe your HW and topology of the cluster as well for me so we can choose something similar here.
thanks
-cihan

byt he way, did the last repro have a compaction msg in the indexer.log?

grep -i “CompactionDaemon: Finished compacting index instance”

@cihangirb
nope. just this for “compaction”:

0x4995c4 github.com/couchbase/indexing/secondary/indexer.(*compactionDaemon).loop+0x954 /home/couchbase/jenkins/workspace/sherlock-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/compaction_manager.go:98
0x4995c4 github.com/couchbase/indexing/secondary/indexer.(*compactionDaemon).loop+0x954 /home/couchbase/jenkins/workspace/sherlock-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/compaction_manager.go:98

@cihangirb,
my last log (where “it stucks”).
indexer.log.zip (947.2 KB)
MB with repro after couple of hours, little bit busy now.
This is from one node with index (3 total, 2 have indexes)

@cihangirb,
you know, i was unable to reproduce :open_mouth:
Even more, after “clean reinstall” (hw remained the same: 3 nodes, 4CPU x 4GB RAM; by the way, i saved all logs folder contents from “stucked installation”, ~ 15M per node .bz2), everything looks predictable:




… ? hmm.

UPDATE: oh-oh:

Is it possible at all ?

UPDATE2: opps №2:

… and my test-script died around here.

@cihangirb, @siri
… and the beautiful message at the end:
2016-06-02T07:18:46.427+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 165390 Milliseconds Vbucket 125
2016-06-02T07:18:46.457+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 165420 Milliseconds Vbucket 125
2016-06-02T07:18:46.487+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 165450 Milliseconds Vbucket 125
2016-06-02T07:18:46.517+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 165480 Milliseconds Vbucket 125

Also:





So:

  1. Unpredictable behavoir for even 2 GSI (what would happen if there were 6, for example ?)
  2. This is the one of the most strange and unconfigurable things i’ve ever seen
  3. Too resourceful to reproduce (how many times do i need to run code to catch it again?)
  4. Similar related bug Loading.... And it’s CLOSED. And do you know what is the reason for closing ? Right, “Closing this issue, could not reproduce with latest build”:

    No analysis, no investigation, just “i don’t see this anymore”. Ok, but did you try to ask “Why” ?

Imho, GSI implementation is buggy, at least for 4.0/4.1.X. Bugs are really unpredictable and sporadic, and this allows to say “oh, nothing happens, this is just your particular configuration”.
Also imho: deep implementation review (and, probably, architectural too) is needed for GSI.

Below is the code, anyone can reproduce (i was need 2 runs):

//package highcpuafterload;

import com.couchbase.client.java.Bucket;
import com.couchbase.client.java.Cluster;
import com.couchbase.client.java.CouchbaseCluster;
import com.couchbase.client.java.document.JsonDocument;
import com.couchbase.client.java.document.json.JsonObject;
import com.couchbase.client.java.env.CouchbaseEnvironment;
import com.couchbase.client.java.env.DefaultCouchbaseEnvironment;
import com.couchbase.client.java.query.N1qlQuery;
import java.util.LinkedList;
import java.util.Scanner;
import java.util.concurrent.Phaser;
import java.util.concurrent.ThreadLocalRandom;

public class BombardaMaxima extends Thread {

private static final CouchbaseEnvironment ce;
private static final Cluster cluster;
private static final String bucket = "default";
// configure here
private static final int indexableInsertions = 1;
private static final int nonIndexableInsertions = 1;
private static final int totalRuns = 200000;
private static final int threads = 20;
private static final int threadPauseEachNRuns = 1000;

private static final Phaser p = new Phaser(threads  + 1);

static {
    ce = DefaultCouchbaseEnvironment.create();
    final LinkedList<String> nodes = new LinkedList<String>();
    nodes.add("A.node");
    nodes.add("B.node");
    nodes.add("C.node");
    cluster = CouchbaseCluster.create(ce, nodes);
    final Bucket b = cluster.openBucket(bucket);
    final String iQA = "CREATE INDEX iQA ON `default`(a, b) WHERE a is valued USING GSI";
    final String iQX = "CREATE INDEX iQX ON `default`(a, c) WHERE a is valued USING GSI";
    
    b.query(N1qlQuery.simple(iQA));        
    b.query(N1qlQuery.simple(iQX));        
    
}

public final void run() {
    Bucket b = null;
    synchronized(cluster) { b = cluster.openBucket(bucket); }
    for(int k = 0; k< totalRuns; k++) {
        // if(k % 1000 == 0) try { Thread.currentThread().sleep(10000); } catch(Exception e) { e.printStackTrace(); }
        for(int i = 0; i < nonIndexableInsertions; i++) b.upsert(makeDoc("x", "y", "z"));
        for(int i = 0; i < indexableInsertions; i++)  b.upsert(makeDoc("a", "b", "c"));
    }    
    p.arriveAndAwaitAdvance();
}


public final JsonDocument makeDoc(String a, String b, String c) {
   /*
    final long ctm = System.currentTimeMillis();
    return JsonDocument
            .create(
                String.valueOf(ThreadLocalRandom.current().nextInt() + "" + ctm),
                JsonObject.empty()
                    .put(a, ThreadLocalRandom.current().nextInt() + "" + ctm)
                    .put(b, ThreadLocalRandom.current().nextInt() + "" + ctm)
                    .put(c, ThreadLocalRandom.current().nextInt() + "" + ctm)
            );
 */
    return JsonDocument
            .create(
                String.valueOf(ThreadLocalRandom.current().nextInt()),
                JsonObject.empty()
                    .put(a, ThreadLocalRandom.current().nextInt())
                    .put(b, ThreadLocalRandom.current().nextInt())
                    .put(c, ThreadLocalRandom.current().nextInt())
            );
}
public static void main(String[] args) {
    for(int i = 0; i< threads; i++) new BombardaMaxima().start();
    p.arriveAndAwaitAdvance();
}

}

And more (but this one with 6 indexes, 1 copy per node of 2 identical indexes + small load):




Hi all:

I’m facing with the same issue, both with 4.1.x and 4.5.x.
Building process of GSI forestdb indexes never ends and indexer.log shows continuously:

2016-10-26T16:59:45.704+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49500 Milliseconds Vbucket 566
2016-10-26T16:59:45.734+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49530 Milliseconds Vbucket 566
2016-10-26T16:59:45.764+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49560 Milliseconds Vbucket 566
2016-10-26T16:59:45.794+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49590 Milliseconds Vbucket 566
2016-10-26T16:59:45.824+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49620 Milliseconds Vbucket 566
2016-10-26T16:59:45.854+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49650 Milliseconds Vbucket 566
2016-10-26T16:59:45.884+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49680 Milliseconds Vbucket 566
2016-10-26T16:59:45.914+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49710 Milliseconds Vbucket 566
2016-10-26T16:59:45.944+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49740 Milliseconds Vbucket 566
2016-10-26T16:59:45.974+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49770 Milliseconds Vbucket 566
2016-10-26T16:59:46.004+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49800 Milliseconds Vbucket 566
2016-10-26T16:59:46.034+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49830 Milliseconds Vbucket 566
2016-10-26T16:59:46.064+00:00 [Warn] Indexer::MutationQueue Waiting for Node Alloc for 49860 Milliseconds Vbucket 566

@cihangirb, any clue?

Thanks,

@jfrodriguez, please switch to circular write mode and 4.5.1 version if not already done.
http://developer.couchbase.com/documentation/server/4.5/indexes/gsi-for-n1ql.html

Hi!

Does anyone know if issues related with building of standard gsi and millions of documents are already solved in 4.6.x? Maybe @keshav_m, @geraldss ?
Thanks

@jfrodriguez, with a reasonable memory quota and circular write mode in 4.5.1, you shouldn’t be facing the issues mentioned in this thread. Let us know if you are having problems and we can review your sizing.