FTS: Scorch. How to compact/compress Scorch index files? Too big files

index

#1

Hi

  • How can I force the compaction / compression of FTS index files using Scorch ?

I have detected that, in occasions, FTS generate index files of 20GB!!! that occupy 1000 times more than the others . They fill 100% my disk :frowning:
couchbase/data/@fts/fts_search_2664cb0ebe4623c4_54820232.pindex/store/000000001234.zap

  • In Moss, I have never had these problems (I am testing Scorch to reduce the size on disk) and I know that in Moss it is possible to define the % of fragmentation to which they want to trigger the Compaction

    {
    “mapping”: {

    },
    “store”: {
    “kvStoreName”: “mossStore”,
    “mossStoreOptions”: {
    “CompactionPercentage”: 0, <----------------------

https://github.com/couchbase/cbft/blob/master/DESIGN-compaction.md

How could I do something seems in Scorch?

  • In Moss there is the tool mossScope that allows to know the fragmentation of an index:
    /opt/couchbase/bin/mossScope stats fragmentation path/to/myStore
    https://github.com/couchbase/mossScope

Is there a similar command in Scorch?

  • I’ve reviewed /logs/fts.log and total_compactions is always =0

    “BucketCache:fts_search:total_compaction_written_bytes”: 54456530640,
    “BucketCache:fts_search:total_compactions”: 0, <--------------------------------

  • Where can I see the fragmentation of my Scorch indexes?

  • I have detected that, when the FTS server is heavily loaded creating several FTS index at the same time, large .zap files are generate an never reduce o compress :frowning:
    If I create the indices fts 1 to 1 it doesn’t usually happen (but sometimes, randomly, generate a large file).
    If I delete and recreate the index, no large files are generates sometimes :roll_eyes: Did I find a bug?
    Example:

    couchbase@cb1:/opt/couchbase/var/lib/couchbase/data/@fts$ ls -la fts_search_2664cb0ebe4623c4_54820232.pindex/store/
    total 20114336
    drwx------ 2 couchbase couchbase 20480 Nov 7 20:58 .
    drwx------ 3 couchbase couchbase 4096 Nov 7 11:49 …
    -rw------- 1 couchbase couchbase 11601527 Nov 7 12:07 000000007118.zap
    -rw------- 1 couchbase couchbase 232229 Nov 7 12:07 00000000713d.zap
    -rw------- 1 couchbase couchbase 16225914 Nov 7 12:07 000000007141.zap
    -rw------- 1 couchbase couchbase 26941072 Nov 7 12:07 000000007192.zap
    -rw------- 1 couchbase couchbase 20495543961 Nov 7 12:13 000000007205.zap <------------- 20GB !!!
    -rw------- 1 couchbase couchbase 1242199 Nov 7 12:07 00000000724c.zap
    -rw------- 1 couchbase couchbase 2347348 Nov 7 12:07 000000007448.zap
    -rw------- 1 couchbase couchbase 1645901 Nov 7 12:07 0000000074f5.zap
    -rw------- 1 couchbase couchbase 1086131 Nov 7 12:07 000000007684.zap
    -rw------- 1 couchbase couchbase 1809879 Nov 7 12:07 0000000078df.zap
    -rw------- 1 couchbase couchbase 1789122 Nov 7 12:07 0000000079b4.zap
    -rw------- 1 couchbase couchbase 1655227 Nov 7 12:07 000000007a6a.zap
    -rw------- 1 couchbase couchbase 2179977 Nov 7 12:07 000000007b32.zap
    -rw------- 1 couchbase couchbase 1819414 Nov 7 12:07 000000007bdd.zap
    -rw------- 1 couchbase couchbase 1848942 Nov 7 12:07 000000007c84.zap
    -rw------- 1 couchbase couchbase 1718391 Nov 7 12:07 000000007d33.zap
    -rw------- 1 couchbase couchbase 911983 Nov 7 12:13 000000007dd5.zap
    -rw------- 1 couchbase couchbase 846549 Nov 7 12:13 000000007dd6.zap
    -rw------- 1 couchbase couchbase 784477 Nov 7 12:13 000000007dd7.zap
    -rw------- 1 couchbase couchbase 19457133 Nov 7 12:13 000000007dd8.zap
    -rw------- 1 couchbase couchbase 665114 Nov 7 16:51 000000007f18.zap
    -rw------- 1 couchbase couchbase 66874 Nov 7 17:55 00000000886b.zap
    -rw------- 1 couchbase couchbase 809 Nov 7 17:56 00000000887e.zap
    -rw------- 1 couchbase couchbase 228 Nov 7 17:56 000000008880.zap
    -rw------- 1 couchbase couchbase 236 Nov 7 17:57 000000008882.zap
    -rw------- 1 couchbase couchbase 236 Nov 7 17:57 000000008884.zap
    -rw------- 1 couchbase couchbase 218 Nov 7 20:56 00000000895a.zap
    -rw------- 1 couchbase couchbase 218 Nov 7 20:58 00000000895c.zap
    -rw------- 1 couchbase couchbase 8388608 Nov 7 20:58 root.bolt

Is a not production enviroment, is a testing enviroment with no ops in CouchBase 6.0

Thanks


#2

Hey @treo. Thank you for reporting this.

Firstly, regarding total_compactions - it is a moss only stat, reason why it shows up as 0 with scorch.

Looking at all the zap files within fts_search_2664cb0ebe4623c4_54820232.pindex/store, it seems like there are many files that are rather old compared to the latest:

   -rw------- 1 couchbase couchbase 218 Nov 7 20:56 00000000895a.zap
   -rw------- 1 couchbase couchbase 218 Nov 7 20:58 00000000895c.zap

This could mean one of the following things:

  1. Searchers are still holding on to older snapshots, but you mention that there aren’t any other ops - so I think it’s safe to rule this one out.
  2. We know of a bug which we have a fix for already but couldn’t make it into 6.0.0 (but will definitely for a 6.0.1), where the scorch merger (which essentially is responsible for the compactions) doesn’t run aggressively enough especially in situations where there’s a lot of data, the fix for that is this for your reference: https://github.com/blevesearch/bleve/pull/1023

As for debugging this kind of a problem where one want’s to track if compaction (file merging) with scorch isn’t running often enough or is lagging, here are some stats that can assist you … check out CurRootEpoch and LastMergedEpoch. If the LastMergedEpoch is very much behind the CurRootEpoch that should give you a clear indication that compaction has fallen behind and is perhaps even struggling to keep up.

We’ve noted an identical issue while testing in house and I can confirm that the above mentioned fix addresses the problem. Also pinging @steve and @sreeks if there’s anything else that they’d like to add.


#3

Adding to @abhinav’s hints,

  1. Unlike moss, scorch will have many index files (zap) on disk at any given moment. And just a few random number of files won’t give us any clues on the progress/status of the compactions. And it’s totally expected that the index files are of varying sizes.

  2. In any case, (single or multiple indexes) , if you end up observing overall bigger index file size on disk compared to that of moss, please do raise a bug report as it is never supposed to happen.
    Scorch is supposed to bring you atleast 60% size reduction on disk as against moss/upside down indexes.


#4

Hi @steve and @sreeks
Thank you very much for responding so quickly :slight_smile:
If it’s a bug, we’ll wait for 6.0.1. https://github.com/blevesearch/bleve/pull/1023

(It is a testing platform where we are evaluating the performance of Couchbase as a replacement for SolR. In Forestdb and Moss the disk space consumption and performance was worse than SolR, but in Scorch it improves a lot, good job! )

I’d still love to help you with whatever you need. Don’t hesitate to ask me any test

Indeed, I feel I have empirically proven that the anomalous growth of Scorch index files (due to no-merge/non-compaction) occurs in situations of high load or many data: for example creating many FTS indexes simultaneously (when doing a restore of all indexes) or when an index has many data (in my case between 2 and 10 million elements). In small indexes of less than 0.5 million items does not occur.
In our case, load or memory pressure should not be a problem because we have 3 nodes exclusively for indexes with 20GB RAM and 8 idle CPU, and no ops.
I suspect the bug is because of the number of items.

Attached is a graph of the size of the indexes:

  • CB 5.5.2 Moss = 400GB
  • CB 6.0 Scorch > 1000GB :frowning: (Full Disk while creating 12 FTS Indexs concurrently)
  • CB 6.0 Scorch = 200GB :slight_smile: (creating FTS Indexs 1 to 1)

Two Questions:
1.- For these files that have been created with anomalous size because have not been in-memory segment merging before persisting segments to disk. Would it be possible to manually compact the files on disk?

2.- Should the size of each pindex and its replica be similar in each server when everything works correctly?
Example: the pindex of “fts_notfound” are the same size os each replicas on each server, but in “fts_search” there is a lot of difference size between the pindex replicas.

CB1$du -sh @fts/fts_*
6.0G    fts_search_35ed824958ee42b4_13aa53f3.pindex
12G     fts_search_35ed824958ee42b4_18572d87.pindex <----------- Very diferent :-(
6.2G    fts_search_35ed824958ee42b4_54820232.pindex
5.4G    fts_search_35ed824958ee42b4_6ddbfb54.pindex

355M    fts_notfound_125d33ec0db50b0d_13aa53f3.pindex  ---------> Similar :-)
357M    fts_notfound_125d33ec0db50b0d_18572d87.pindex
352M    fts_notfound_125d33ec0db50b0d_54820232.pindex
344M    fts_notfound_125d33ec0db50b0d_6ddbfb54.pindex


CB2$du -sh @fts/fts_*
8.2G    fts_search_35ed824958ee42b4_13aa53f3.pindex
6.3G    fts_search_35ed824958ee42b4_18572d87.pindex <---------- Very diferent :-(
6.2G    fts_search_35ed824958ee42b4_aa574717.pindex
5.8G    fts_search_35ed824958ee42b4_f4e0a48a.pindex

349M    fts_notfound_125d33ec0db50b0d_13aa53f3.pindex  ---------> Similar :-)
354M    fts_notfound_125d33ec0db50b0d_18572d87.pindex
353M    fts_notfound_125d33ec0db50b0d_aa574717.pindex
373M    fts_notfound_125d33ec0db50b0d_f4e0a48a.pindex


CB3$du -sh @fts/fts_*
6.4G    fts_search_35ed824958ee42b4_54820232.pindex
5.8G    fts_search_35ed824958ee42b4_6ddbfb54.pindex
6.2G    fts_search_35ed824958ee42b4_aa574717.pindex
7.3G    fts_search_35ed824958ee42b4_f4e0a48a.pindex

352M    fts_notfound_125d33ec0db50b0d_54820232.pindex
350M    fts_notfound_125d33ec0db50b0d_6ddbfb54.pindex
356M    fts_notfound_125d33ec0db50b0d_aa574717.pindex
355M    fts_notfound_125d33ec0db50b0d_f4e0a48a.pindex

1.- Correct. Is a test enviroment, so there is not write ops and no mutations, so no new .zap files since initial index creation.

    couchbase@cb1:/opt/couchbase/var/lib/couchbase/data/@fts/fts_search_35ed824958ee42b4_18572d87.pindex/store# ls -la
total 12232408
drwx------ 2 couchbase couchbase      20480 Nov  8 20:00 .
drwx------ 3 couchbase couchbase       4096 Nov  7 21:47 ..
-rw------- 1 couchbase couchbase     102170 Nov  7 22:03 000000007014.zap <----- start index creation at 22:03
-rw------- 1 couchbase couchbase      77341 Nov  7 22:03 000000007017.zap
-rw------- 1 couchbase couchbase     127089 Nov  7 22:03 000000007019.zap
-rw------- 1 couchbase couchbase      97045 Nov  7 22:03 00000000701a.zap
-rw------- 1 couchbase couchbase     144794 Nov  7 22:03 00000000702b.zap
-rw------- 1 couchbase couchbase   17827705 Nov  7 22:03 00000000702e.zap
-rw------- 1 couchbase couchbase   26142530 Nov  7 22:03 000000007074.zap
-rw------- 1 couchbase couchbase    9012372 Nov  7 22:03 0000000070e2.zap
-rw------- 1 couchbase couchbase 4523629914 Nov  7 22:06 000000007105.zap <------- Big file :-(
-rw------- 1 couchbase couchbase    1237923 Nov  7 22:03 0000000073ff.zap
-rw------- 1 couchbase couchbase    1122834 Nov  7 22:03 000000007800.zap
-rw------- 1 couchbase couchbase    1945165 Nov  7 22:03 0000000078cb.zap
-rw------- 1 couchbase couchbase     822760 Nov  7 22:03 00000000798f.zap
-rw------- 1 couchbase couchbase    3516821 Nov  7 22:04 000000007c2f.zap
-rw------- 1 couchbase couchbase    2598069 Nov  7 22:04 000000007d84.zap
-rw------- 1 couchbase couchbase 7897163056 Nov  7 22:09 000000007de2.zap <------ Big file :-(
-rw------- 1 couchbase couchbase     682439 Nov  7 22:09 000000007de7.zap
-rw------- 1 couchbase couchbase     926484 Nov  7 22:09 000000007de8.zap
-rw------- 1 couchbase couchbase    1037159 Nov  7 22:09 000000007de9.zap
-rw------- 1 couchbase couchbase   13386250 Nov  7 22:09 000000007dea.zap <------ End index creating at 22:09
-rw------- 1 couchbase couchbase        192 Nov  8 08:39 000000008405.zap <------ Is a couchbase testing cluster so...
-rw------- 1 couchbase couchbase        182 Nov  8 09:55 0000000084be.zap <------ ... no write ops. No index mutations. No new files.
-rw------- 1 couchbase couchbase   19967767 Nov  8 09:55 0000000084bf.zap
-rw------- 1 couchbase couchbase        182 Nov  8 11:29 0000000085a5.zap
-rw------- 1 couchbase couchbase        241 Nov  8 19:58 000000008b19.zap
-rw------- 1 couchbase couchbase        241 Nov  8 20:00 000000008b1d.zap
-rw------- 1 couchbase couchbase    8388608 Nov  8 20:00 root.bolt

CurRootEpoch = LastMergedEpoch :slight_smile: Because there are no write ops and enough RAM&CPU&IOPs

"bleveIndexStats": {
  "index": {
    "CurOnDiskBytes": 5736605363,
    "CurOnDiskFiles": 32,
    "CurRootEpoch": 33648,  <-------------------   =
    "LastMergedEpoch": 33648, <-----------------   =
    "LastPersistedEpoch": 33648,
    "MaxBatchIntroTime": 887177117,
    "MaxFileMergeZapTime": 93267937840,
    "MaxMemMergeZapTime": 12703963708,
    "TotAnalysisTime": 1187593302386,
    "TotBatchIntroTime": 44022303466,
    "TotBatches": 31931,
    "TotBatchesEmpty": 541,
    "TotDeletes": 1430,
    "TotFileMergeIntroductions": 132,
    "TotFileMergeIntroductionsDone": 132,
    "TotFileMergeLoopBeg": 2185,
    "TotFileMergeLoopEnd": 2184,
    "TotFileMergeLoopErr": 0,
    "TotFileMergePlan": 1544,
    "TotFileMergePlanErr": 0,
    "TotFileMergePlanNone": 4,
    "TotFileMergePlanOk": 1540,
    "TotFileMergePlanTasks": 132,
    "TotFileMergePlanTasksDone": 132,
    "TotFileMergePlanTasksErr": 0,
    "TotFileMergePlanTasksSegments": 1187,
    "TotFileMergePlanTasksSegmentsEmpty": 0,
    "TotFileMergeSegments": 1187,
    "TotFileMergeSegmentsEmpty": 0,
    "TotFileMergeWrittenBytes": 7619974418,
    "TotFileMergeZapBeg": 132,
    "TotFileMergeZapEnd": 132,
    "TotFileMergeZapTime": 542234161909,
    "TotFileSegmentsAtRoot": 31,
    "TotIndexTime": 1164998106372,
    "TotIndexedPlainTextBytes": 724032987,
    "TotIntroduceLoop": 34868,
    "TotIntroduceMergeBeg": 1013,
    "TotIntroduceMergeEnd": 1013,
    "TotIntroducePersistBeg": 704,
    "TotIntroducePersistEnd": 704,
    "TotIntroduceRevertBeg": 0,
    "TotIntroduceRevertEnd": 0,
    "TotIntroduceSegmentBeg": 31931,
    "TotIntroduceSegmentEnd": 31931,
    "TotIntroducedItems": 6158652,
    "TotIntroducedSegmentsBatch": 31390,
    "TotIntroducedSegmentsMerge": 1013,
    "TotItemsToPersist": 0,
    "TotMemMergeBeg": 881,
    "TotMemMergeDone": 881,
    "TotMemMergeErr": 0,
    "TotMemMergeSegments": 30657,
    "TotMemMergeZapBeg": 881,
    "TotMemMergeZapEnd": 881,
    "TotMemMergeZapTime": 544562128737,
    "TotMemorySegmentsAtRoot": 0,
    "TotOnErrors": 0,
    "TotPersistLoopBeg": 2828,
    "TotPersistLoopEnd": 1218,
    "TotPersistLoopErr": 0,
    "TotPersistLoopProgress": 1609,
    "TotPersistLoopWait": 1219,
    "TotPersistLoopWaitNotified": 560,
    "TotPersistedItems": 6158652,
    "TotPersistedSegments": 1614,
    "TotPersisterMergerNapBreak": 1480,
    "TotPersisterNapPauseCompleted": 1342,
    "TotPersisterSlowMergerPause": 0,
    "TotPersisterSlowMergerResume": 0,
    "TotTermSearchersFinished": 117,
    "TotTermSearchersStarted": 117,
    "TotUpdates": 6158652,
    "analysis_time": 1187593302386,
    "batches": 31931,
    "deletes": 1430,
    "errors": 0,
    "index_time": 1164998106372,
    "num_bytes_used_disk": 5736605363,
    "num_files_on_disk": 32,
    "num_items_introduced": 6158652,
    "num_items_persisted": 6158652,
    "num_persister_nap_merger_break": 1480,
    "num_persister_nap_pause_completed": 1342,
    "num_plain_text_bytes_indexed": 724032987,
    "num_recs_to_persist": 0,
    "num_root_filesegments": 31,
    "num_root_memorysegments": 0,
    "term_searchers_finished": 117,
    "term_searchers_started": 117,
    "total_compaction_written_bytes": 7619974418,
    "updates": 6158652
  },
  "search_time": 10986920962,
  "searches": 3
},
"basic": {
  "DocCount": 6158001   <-------------------
},
"partitions": {
  "1011": {
    "seq": 36838,
    "seqReceived": 36838,
    "uuid": "219147537600963"
  },

Thank you very much


#5

Thanks @treo for the details.

Few points to keep in mind during these experiments,

  1. Your finding about a single scorch index size looks correct and as per the expectation. (200GB size as compared to 450GB in upside down)

  2. Not clear on the worry part of >1000 GB disk size with 12 scorch indexes. Indexes are not sharing any data on disk even if they are created on the same bucket. So I am not yet clear on why 1000 GB is too high for 12 indexes? Can you please help me understand this.

  3. Your “CurRootEpoch = LastMergedEpoch” finding looks correct. If we roughly take the number of items - “TotIntroducedItems” stats value - 6158652 / ~6M is expected to create 32 files on disk and the “num_files_on_disk”: 32 stats confirms that. There are no wrinkles seen here.

  4. The merge policy is expected to create a growing logarithmic staircase of segments, and hence its perfectly valid to have a few really large segment files and a higher number of smaller segment files on disk. We can further optimize the merge policy over the index creation/update operations. But will do it later as we are not yet fully clear about the issue here. And its also possible to have an older smaller segment file which might have outlived it’s peers during those compaction cycles.

  5. The number of index / zap files are primarily controlled by the number of documents in it, and they are not created, split or merged based on their size to a larger extent.

  6. Can you let us know what was the FTS memory quota set in these trials? In case of memory pressure situation as against the quota set, we would have the above mentioned issue of skipping the memory merge of segments to result in too many files on disk. And that could be a contributing factor to the " very different size" observed between a primary and the corresponding replica partition across nodes. Please share the stats for both of those partitions. May give some clues.

Trying to answer the original concerns raised above.

1.- For these files that have been created with anomalous size because have not been in-memory segment merging before persisting segments to disk. Would it be possible to manually compact the files on disk?

Nope, manual disk compaction isn’t possible. If there aren’t much updates or deletes to data, then compactions may not give you much size savings. Of course, its not a desirable situation as we will have several overheads from too many segment files on disk.

2.- Should the size of each pindex and its replica be similar in each server when everything works correctly?
Example: the pindex of “fts_notfound” are the same size os each replicas on each server, but in “fts_search” there is a lot of difference size between the pindex replicas.

With scorch index partitions (primary & replica), once the merger caught up with the root epoch, it is expected to be very close in size across primary and replica partitions.
Can you please cross check and share those two pindex stats similar to the one you shared already. Might have some clues there.

Sreekanth


#6

:ok_hand: Yes. Scorch is a big improvement over Moss :slight_smile: Great job

The size is relative :wink:
I currently use SolR/Lucent and the size of the indexes is 10 times smaller.
It is not an economic concern but a technical one.
We are a small company for which quality disk storage is the most expensive part of our hosting bill.

:ok_hand:

:ok_hand:

:ok_hand:

3 nodes exclusively dedicated to FTS with 20GB RAM each (13GB for FTS/node service)
I want to think that RAM is enough (it’s a test/development/trial environment)

stats: {
"Test:fts_search:avg_queries_latency": 57794.682334,
"Test:fts_search:batch_merge_count": 0,
"Test:fts_search:doc_count": 24848866,
"Test:fts_search:iterator_next_count": 0,
"Test:fts_search:iterator_seek_count": 0,
"Test:fts_search:num_bytes_live_data": 0,
"Test:fts_search:num_bytes_used_disk": 31040749377,
"Test:fts_search:num_files_on_disk": 128,
"Test:fts_search:num_mutations_to_index": 0,
"Test:fts_search:num_persister_nap_merger_break": 7778,
"Test:fts_search:num_persister_nap_pause_completed": 9540,
"Test:fts_search:num_pindexes": 4,
"Test:fts_search:num_pindexes_actual": 4,
"Test:fts_search:num_pindexes_target": 4,
"Test:fts_search:num_recs_to_persist": 0,
"Test:fts_search:num_root_filesegments": 121,
"Test:fts_search:num_root_memorysegments": 0,
"Test:fts_search:reader_get_count": 0,
"Test:fts_search:reader_multi_get_count": 0,
"Test:fts_search:reader_prefix_iterator_count": 0,
"Test:fts_search:reader_range_iterator_count": 0,
"Test:fts_search:timer_batch_store_count": 0,
"Test:fts_search:timer_data_delete_count": 8088,
"Test:fts_search:timer_data_update_count": 24857201,
"Test:fts_search:timer_opaque_get_count": 10389,
"Test:fts_search:timer_opaque_set_count": 9707,
"Test:fts_search:timer_rollback_count": 0,
"Test:fts_search:timer_snapshot_start_count": 9025,
"Test:fts_search:total_bytes_indexed": 2921738398,
"Test:fts_search:total_bytes_query_results": 44160,
"Test:fts_search:total_compaction_written_bytes": 45896386420,
"Test:fts_search:total_compactions": 0,
"Test:fts_search:total_queries": 10,
"Test:fts_search:total_queries_error": 2,
"Test:fts_search:total_queries_slow": 3,
"Test:fts_search:total_queries_timeout": 0,
"Test:fts_search:total_request_time": 578241593679,
"Test:fts_search:total_term_searchers": 234,
"Test:fts_search:writer_execute_batch_count": 0,

"Test:fts_editorial:avg_queries_latency": 146.733344,
"Test:fts_editorial:batch_merge_count": 0,
"Test:fts_editorial:doc_count": 24848866,
"Test:fts_editorial:iterator_next_count": 0,
"Test:fts_editorial:iterator_seek_count": 0,
"Test:fts_editorial:num_bytes_live_data": 0,
"Test:fts_editorial:num_bytes_used_disk": 14719749049,
"Test:fts_editorial:num_files_on_disk": 131,
"Test:fts_editorial:num_mutations_to_index": 0,
"Test:fts_editorial:num_persister_nap_merger_break": 14592,
"Test:fts_editorial:num_persister_nap_pause_completed": 13404,
"Test:fts_editorial:num_pindexes": 4,
"Test:fts_editorial:num_pindexes_actual": 4,
"Test:fts_editorial:num_pindexes_target": 4,
"Test:fts_editorial:num_recs_to_persist": 0,
"Test:fts_editorial:num_root_filesegments": 124,
"Test:fts_editorial:num_root_memorysegments": 0,
"Test:fts_editorial:reader_get_count": 0,
"Test:fts_editorial:reader_multi_get_count": 0,
"Test:fts_editorial:reader_prefix_iterator_count": 0,
"Test:fts_editorial:reader_range_iterator_count": 0,
"Test:fts_editorial:timer_batch_store_count": 0,
"Test:fts_editorial:timer_data_delete_count": 8090,
"Test:fts_editorial:timer_data_update_count": 24860228,
"Test:fts_editorial:timer_opaque_get_count": 17181,
"Test:fts_editorial:timer_opaque_set_count": 16499,
"Test:fts_editorial:timer_rollback_count": 0,
"Test:fts_editorial:timer_snapshot_start_count": 15817,
"Test:fts_editorial:total_bytes_indexed": 824292143,
"Test:fts_editorial:total_bytes_query_results": 27791348,
"Test:fts_editorial:total_compaction_written_bytes": 17785515666,
"Test:fts_editorial:total_compactions": 0,
"Test:fts_editorial:total_queries": 2917,
"Test:fts_editorial:total_queries_error": 2,
"Test:fts_editorial:total_queries_slow": 4,
"Test:fts_editorial:total_queries_timeout": 0,
"Test:fts_editorial:total_request_time": 428025274906,
"Test:fts_editorial:total_term_searchers": 761532,
"Test:fts_editorial:writer_execute_batch_count": 0,

"Test:fts_notfound:avg_queries_latency": 0,
"Test:fts_notfound:batch_merge_count": 0,
"Test:fts_notfound:doc_count": 24848866,
"Test:fts_notfound:iterator_next_count": 0,
"Test:fts_notfound:iterator_seek_count": 0,
"Test:fts_notfound:num_bytes_live_data": 0,
"Test:fts_notfound:num_bytes_used_disk": 1490519065,
"Test:fts_notfound:num_files_on_disk": 127,
"Test:fts_notfound:num_mutations_to_index": 0,
"Test:fts_notfound:num_persister_nap_merger_break": 14341,
"Test:fts_notfound:num_persister_nap_pause_completed": 13224,
"Test:fts_notfound:num_pindexes": 4,
"Test:fts_notfound:num_pindexes_actual": 4,
"Test:fts_notfound:num_pindexes_target": 4,
"Test:fts_notfound:num_recs_to_persist": 0,
"Test:fts_notfound:num_root_filesegments": 120,
"Test:fts_notfound:num_root_memorysegments": 0,
"Test:fts_notfound:reader_get_count": 0,
"Test:fts_notfound:reader_multi_get_count": 0,
"Test:fts_notfound:reader_prefix_iterator_count": 0,
"Test:fts_notfound:reader_range_iterator_count": 0,
"Test:fts_notfound:timer_batch_store_count": 0,
"Test:fts_notfound:timer_data_delete_count": 8090,
"Test:fts_notfound:timer_data_update_count": 24860152,
"Test:fts_notfound:timer_opaque_get_count": 17105,
"Test:fts_notfound:timer_opaque_set_count": 16423,
"Test:fts_notfound:timer_rollback_count": 0,
"Test:fts_notfound:timer_snapshot_start_count": 15741,
"Test:fts_notfound:total_bytes_indexed": 757255247,
"Test:fts_notfound:total_bytes_query_results": 0,
"Test:fts_notfound:total_compaction_written_bytes": 2755785815,
"Test:fts_notfound:total_compactions": 0,
"Test:fts_notfound:total_queries": 0,
"Test:fts_notfound:total_queries_error": 0,
"Test:fts_notfound:total_queries_slow": 0,
"Test:fts_notfound:total_queries_timeout": 0,
"Test:fts_notfound:total_request_time": 0,
"Test:fts_notfound:total_term_searchers": 0,
"Test:fts_notfound:writer_execute_batch_count": 0,
"Test:fts_photos:avg_queries_latency": 117.918537,
"Test:fts_photos:batch_merge_count": 0,
"Test:fts_photos:doc_count": 24848866,
"Test:fts_photos:iterator_next_count": 0,
"Test:fts_photos:iterator_seek_count": 0,
"Test:fts_photos:num_bytes_live_data": 0,
"Test:fts_photos:num_bytes_used_disk": 19052483137,
"Test:fts_photos:num_files_on_disk": 125,
"Test:fts_photos:num_mutations_to_index": 0,
"Test:fts_photos:num_persister_nap_merger_break": 15366,
"Test:fts_photos:num_persister_nap_pause_completed": 14142,
"Test:fts_photos:num_pindexes": 4,
"Test:fts_photos:num_pindexes_actual": 4,
"Test:fts_photos:num_pindexes_target": 4,
"Test:fts_photos:num_recs_to_persist": 0,
"Test:fts_photos:num_root_filesegments": 118,
"Test:fts_photos:num_root_memorysegments": 0,
"Test:fts_photos:reader_get_count": 0,
"Test:fts_photos:reader_multi_get_count": 0,
"Test:fts_photos:reader_prefix_iterator_count": 0,
"Test:fts_photos:reader_range_iterator_count": 0,
"Test:fts_photos:timer_batch_store_count": 0,
"Test:fts_photos:timer_data_delete_count": 8090,
"Test:fts_photos:timer_data_update_count": 24860360,
"Test:fts_photos:timer_opaque_get_count": 17314,
"Test:fts_photos:timer_opaque_set_count": 16632,
"Test:fts_photos:timer_rollback_count": 0,
"Test:fts_photos:timer_snapshot_start_count": 15950,
"Test:fts_photos:total_bytes_indexed": 6392906351,
"Test:fts_photos:total_bytes_query_results": 57718,
"Test:fts_photos:total_compaction_written_bytes": 29177065557,
"Test:fts_photos:total_compactions": 0,
"Test:fts_photos:total_queries": 125,
"Test:fts_photos:total_queries_error": 0,
"Test:fts_photos:total_queries_slow": 0,
"Test:fts_photos:total_queries_timeout": 0,
"Test:fts_photos:total_request_time": 14739954017,
"Test:fts_photos:total_term_searchers": 838,
"Test:fts_photos:writer_execute_batch_count": 0,

"Test:fts_places:avg_queries_latency": 0,
"Test:fts_places:batch_merge_count": 0,
"Test:fts_places:doc_count": 24848866,
"Test:fts_places:iterator_next_count": 0,
"Test:fts_places:iterator_seek_count": 0,
"Test:fts_places:num_bytes_live_data": 0,
"Test:fts_places:num_bytes_used_disk": 1164159601,
"Test:fts_places:num_files_on_disk": 131,
"Test:fts_places:num_mutations_to_index": 0,
"Test:fts_places:num_persister_nap_merger_break": 15405,
"Test:fts_places:num_persister_nap_pause_completed": 13163,
"Test:fts_places:num_pindexes": 4,
"Test:fts_places:num_pindexes_actual": 4,
"Test:fts_places:num_pindexes_target": 4,
"Test:fts_places:num_recs_to_persist": 0,
"Test:fts_places:num_root_filesegments": 124,
"Test:fts_places:num_root_memorysegments": 0,
"Test:fts_places:reader_get_count": 0,
"Test:fts_places:reader_multi_get_count": 0,
"Test:fts_places:reader_prefix_iterator_count": 0,
"Test:fts_places:reader_range_iterator_count": 0,
"Test:fts_places:timer_batch_store_count": 0,
"Test:fts_places:timer_data_delete_count": 8090,
"Test:fts_places:timer_data_update_count": 24860126,
"Test:fts_places:timer_opaque_get_count": 17079,
"Test:fts_places:timer_opaque_set_count": 16397,
"Test:fts_places:timer_rollback_count": 0,
"Test:fts_places:timer_snapshot_start_count": 15715,
"Test:fts_places:total_bytes_indexed": 542168282,
"Test:fts_places:total_bytes_query_results": 0,
"Test:fts_places:total_compaction_written_bytes": 2257383449,
"Test:fts_places:total_compactions": 0,
"Test:fts_places:total_queries": 0,
"Test:fts_places:total_queries_error": 0,
"Test:fts_places:total_queries_slow": 0,
"Test:fts_places:total_queries_timeout": 0,
"Test:fts_places:total_request_time": 0,
"Test:fts_places:total_term_searchers": 0,
"Test:fts_places:writer_execute_batch_count": 0,

"Test:fts_redirection:avg_queries_latency": 0,
"Test:fts_redirection:batch_merge_count": 0,
"Test:fts_redirection:doc_count": 24848866,
"Test:fts_redirection:iterator_next_count": 0,
"Test:fts_redirection:iterator_seek_count": 0,
"Test:fts_redirection:num_bytes_live_data": 0,
"Test:fts_redirection:num_bytes_used_disk": 1258686527,
"Test:fts_redirection:num_files_on_disk": 127,
"Test:fts_redirection:num_mutations_to_index": 0,
"Test:fts_redirection:num_persister_nap_merger_break": 15077,
"Test:fts_redirection:num_persister_nap_pause_completed": 13032,
"Test:fts_redirection:num_pindexes": 4,
"Test:fts_redirection:num_pindexes_actual": 4,
"Test:fts_redirection:num_pindexes_target": 4,
"Test:fts_redirection:num_recs_to_persist": 0,
"Test:fts_redirection:num_root_filesegments": 120,
"Test:fts_redirection:num_root_memorysegments": 0,
"Test:fts_redirection:reader_get_count": 0,
"Test:fts_redirection:reader_multi_get_count": 0,
"Test:fts_redirection:reader_prefix_iterator_count": 0,
"Test:fts_redirection:reader_range_iterator_count": 0,
"Test:fts_redirection:timer_batch_store_count": 0,
"Test:fts_redirection:timer_data_delete_count": 8090,
"Test:fts_redirection:timer_data_update_count": 24860106,
"Test:fts_redirection:timer_opaque_get_count": 17059,
"Test:fts_redirection:timer_opaque_set_count": 16377,
"Test:fts_redirection:timer_rollback_count": 0,
"Test:fts_redirection:timer_snapshot_start_count": 15695,
"Test:fts_redirection:total_bytes_indexed": 674388645,
"Test:fts_redirection:total_bytes_query_results": 0,
"Test:fts_redirection:total_compaction_written_bytes": 2371532265,
"Test:fts_redirection:total_compactions": 0,
"Test:fts_redirection:total_queries": 0,
"Test:fts_redirection:total_queries_error": 0,
"Test:fts_redirection:total_queries_slow": 0,
"Test:fts_redirection:total_queries_timeout": 0,
"Test:fts_redirection:total_request_time": 0,
"Test:fts_redirection:total_term_searchers": 0,
"Test:fts_redirection:writer_execute_batch_count": 0,

"Test:fts_result:avg_queries_latency": 0,
"Test:fts_result:batch_merge_count": 0,
"Test:fts_result:doc_count": 24849653,
"Test:fts_result:iterator_next_count": 0,
"Test:fts_result:iterator_seek_count": 0,
"Test:fts_result:num_bytes_live_data": 0,
"Test:fts_result:num_bytes_used_disk": 1296288886,
"Test:fts_result:num_files_on_disk": 125,
"Test:fts_result:num_mutations_to_index": 0,
"Test:fts_result:num_persister_nap_merger_break": 14945,
"Test:fts_result:num_persister_nap_pause_completed": 12958,
"Test:fts_result:num_pindexes": 4,
"Test:fts_result:num_pindexes_actual": 4,
"Test:fts_result:num_pindexes_target": 4,
"Test:fts_result:num_recs_to_persist": 0,
"Test:fts_result:num_root_filesegments": 118,
"Test:fts_result:num_root_memorysegments": 0,
"Test:fts_result:reader_get_count": 0,
"Test:fts_result:reader_multi_get_count": 0,
"Test:fts_result:reader_prefix_iterator_count": 0,
"Test:fts_result:reader_range_iterator_count": 0,
"Test:fts_result:timer_batch_store_count": 0,
"Test:fts_result:timer_data_delete_count": 8109,
"Test:fts_result:timer_data_update_count": 24859193,
"Test:fts_result:timer_opaque_get_count": 15280,
"Test:fts_result:timer_opaque_set_count": 14598,
"Test:fts_result:timer_rollback_count": 0,
"Test:fts_result:timer_snapshot_start_count": 13916,
"Test:fts_result:total_bytes_indexed": 546988241,
"Test:fts_result:total_bytes_query_results": 0,
"Test:fts_result:total_compaction_written_bytes": 2289212141,
"Test:fts_result:total_compactions": 0,
"Test:fts_result:total_queries": 0,
"Test:fts_result:total_queries_error": 0,
"Test:fts_result:total_queries_slow": 0,
"Test:fts_result:total_queries_timeout": 0,
"Test:fts_result:total_request_time": 0,
"Test:fts_result:total_term_searchers": 0,
"Test:fts_result:writer_execute_batch_count": 0,
 
"Test:fts_videos:avg_queries_latency": 0,
"Test:fts_videos:batch_merge_count": 0,
"Test:fts_videos:doc_count": 24848866,
"Test:fts_videos:iterator_next_count": 0,
"Test:fts_videos:iterator_seek_count": 0,
"Test:fts_videos:num_bytes_live_data": 0,
"Test:fts_videos:num_bytes_used_disk": 2031904546,
"Test:fts_videos:num_files_on_disk": 125,
"Test:fts_videos:num_mutations_to_index": 0,
"Test:fts_videos:num_persister_nap_merger_break": 14400,
"Test:fts_videos:num_persister_nap_pause_completed": 12965,
"Test:fts_videos:num_pindexes": 4,
"Test:fts_videos:num_pindexes_actual": 4,
"Test:fts_videos:num_pindexes_target": 4,
"Test:fts_videos:num_recs_to_persist": 0,
"Test:fts_videos:num_root_filesegments": 118,
"Test:fts_videos:num_root_memorysegments": 0,
"Test:fts_videos:reader_get_count": 0,
"Test:fts_videos:reader_multi_get_count": 0,
"Test:fts_videos:reader_prefix_iterator_count": 0,
"Test:fts_videos:reader_range_iterator_count": 0,
"Test:fts_videos:timer_batch_store_count": 0,
"Test:fts_videos:timer_data_delete_count": 8090,
"Test:fts_videos:timer_data_update_count": 24859968,
"Test:fts_videos:timer_opaque_get_count": 16922,
"Test:fts_videos:timer_opaque_set_count": 16240,
"Test:fts_videos:timer_rollback_count": 0,
"Test:fts_videos:timer_snapshot_start_count": 15558,
"Test:fts_videos:total_bytes_indexed": 613489004,
"Test:fts_videos:total_bytes_query_results": 0,
"Test:fts_videos:total_compaction_written_bytes": 3213136128,
"Test:fts_videos:total_compactions": 0,
"Test:fts_videos:total_queries": 0,
"Test:fts_videos:total_queries_error": 0,
"Test:fts_videos:total_queries_slow": 0,
"Test:fts_videos:total_queries_timeout": 0,
"Test:fts_videos:total_request_time": 0,
"Test:fts_videos:total_term_searchers": 0,
"Test:fts_videos:writer_execute_batch_count": 0,
"batch_bytes_added": 112336897772,
"batch_bytes_removed": 112336897772,
"num_bytes_used_ram": 2003650568,
"pct_cpu_gc": 0.007450463197113657,
"tot_batches_flushed_on_maxops": 2699756,
"tot_batches_flushed_on_timer": 204525,
"tot_http_limitlisteners_closed": 0,
"tot_http_limitlisteners_opened": 1,
"tot_https_limitlisteners_closed": 0,
"tot_https_limitlisteners_opened": 1,
"tot_queryreject_on_memquota": 0,
"tot_remote_http": 0,
"tot_remote_http2": 1326863,
"total_gc": 64324
}

:ok_hand:
In my test environment there are few index mutations by the few write ops.
I’m worried that in a real environment with many mutations (like the ones that occur when you create the FTS index for the first time) If the compactions do not occur, there may be large orphan files that can fill my disk again.

:ok_hand:
What statistics exactly do you need? stats:? managerStats:? bleveIndexStats?
Should I get them from the fts.logs or is there a better method?

2018-11-09T21:43:01.870+01:00 [INFO] managerStats: {"feeds":{"fts_search_35ed824958ee42b4_13aa53f3":{"bucketDataSourceStats":{"TotStart":1,"TotKick":1,"TotKickDeduped":0,"TotKickOk":1,"TotRefreshCluster":1,"TotRefreshClusterConnectBucket":1,"TotRefreshClusterConnectBucketErr":0,"TotRefreshClusterConnectBucketOk":1,"TotRefreshClusterBucketUUIDErr":0,"TotRefreshClusterVBMNilErr":0,"TotRefreshClusterKickWorkers":2,"TotRefreshClusterKickWorkersClosed":0,"TotRefreshClusterKickWorkersStopped":0,"TotRefreshClusterKickWorkersOk":2,"TotRefreshClusterStopped":0,"TotRefreshClusterAwokenClosed":0,"TotRefreshClusterAwokenStopped":0,"TotRefreshClusterAwokenRestart":0,"TotRefreshClusterAwoken":1,"TotRefreshClusterAllServerURLsConnectBucketErr":0,"TotRefreshClusterDone":0,"TotRefreshWorkers":2,"TotRefreshWorkersVBMNilErr":0,"TotRefreshWorkersVBucketIDErr":0,"TotRefreshWorkersServerIdxsErr":0,"TotRefreshWorkersMasterIdxErr":0,"TotRefreshWorkersMasterServerErr":0,"TotRefreshWorkersRemoveWorker":0,"TotRefreshWorkersAddWorker":1,"TotRefreshWorkersKickWorker":2,"TotRefreshWorkersCloseWorker":0,"TotRefreshWorkersLoop":2,"TotRefreshWorkersLoopDone":0,"TotRefreshWorkersDone":0,"TotWorkerStart":1,"TotWorkerDone":0,"TotWorkerBody":1,"TotWorkerBodyKick":1,"TotWorkerConnect":1,"TotWorkerConnectErr":0,"TotWorkerConnectOk":1,"TotWorkerAuth":0,"TotWorkerAuthErr":0,"TotWorkerAuthFail":0,"TotWorkerAuthOk":0,"TotWorkerUPROpenErr":0,"TotWorkerUPROpenOk":1,"TotWorkerAuthenticateMemcachedConn":1,"TotWorkerAuthenticateMemcachedConnErr":0,"TotWorkerAuthenticateMemcachedConnOk":1,"TotWorkerClientClose":0,"TotWorkerClientCloseDone":0,"TotWorkerTransmitStart":1,"TotWorkerTransmit":4769,"TotWorkerTransmitErr":0,"TotWorkerTransmitOk":4769,"TotWorkerTransmitDone":0,"TotWorkerReceiveStart":1,"TotWorkerReceive":6240094,"TotWorkerReceiveErr":0,"TotWorkerReceiveOk":6240093,"TotWorkerReceiveDone":0,"TotWorkerSendEndCh":0,"TotWorkerRecvEndCh":0,"TotWorkerHandleRecv":5328,"TotWorkerHandleRecvErr":0,"TotWorkerHandleRecvOk":5328,"TotWorkerCleanup":0,"TotWorkerCleanupDone":0,"TotRefreshWorker":2,"TotRefreshWorkerDone":0,"TotRefreshWorkerOk":2,"TotUPRDataChange":6234765,"TotUPRDataChangeStateErr":0,"TotUPRDataChangeMutation":6232627,"TotUPRDataChangeDeletion":2138,"TotUPRDataChangeExpiration":0,"TotUPRDataChangeErr":0,"TotUPRDataChangeOk":6234765,"TotUPRCloseStream":0,"TotUPRCloseStreamRes":0,"TotUPRCloseStreamResStateErr":0,"TotUPRCloseStreamResErr":0,"TotUPRCloseStreamResOk":0,"TotUPRStreamReq":171,"TotUPRStreamReqWant":171,"TotUPRStreamReqRes":171,"TotUPRStreamReqResStateErr":0,"TotUPRStreamReqResFail":0,"TotUPRStreamReqResFailNotMyVBucket":0,"TotUPRStreamReqResFailERange":0,"TotUPRStreamReqResFailENoMem":0,"TotUPRStreamReqResRollback":0,"TotUPRStreamReqResRollbackStart":0,"TotUPRStreamReqResRollbackErr":0,"TotUPRStreamReqResWantAfterRollbackErr":0,"TotUPRStreamReqResKick":0,"TotUPRStreamReqResSuccess":171,"TotUPRStreamReqResSuccessOk":171,"TotUPRStreamReqResFLogErr":0,"TotUPRStreamEnd":0,"TotUPRStreamEndStateErr":0,"TotUPRStreamEndKick":0,"TotUPRSnapshot":982,"TotUPRSnapshotStateErr":0,"TotUPRSnapshotStart":982,"TotUPRSnapshotStartErr":0,"TotUPRSnapshotOk":982,"TotUPRNoop":1437,"TotUPRControl":3,"TotUPRControlErr":0,"TotUPRBufferAck":426,"TotWantCloseRequestedVBucketErr":0,"TotWantClosingVBucketErr":0,"TotSelectBucketErr":0,"TotHandShakeErr":0,"TotGetVBucketMetaData":1324,"TotGetVBucketMetaDataUnmarshalErr":0,"TotGetVBucketMetaDataErr":0,"TotGetVBucketMetaDataOk":1324,"TotSetVBucketMetaData":1153,"TotSetVBucketMetaDataMarshalErr":0,"TotSetVBucketMetaDataErr":0,"TotSetVBucketMetaDataOk":1153,"TotPingTimeout":5751,"TotPingReq":2735,"TotPingReqDone":2735}
2018-11-09T21:48:01.870+01:00 [INFO] managerStats: {"feeds":{"fts_search_35ed824958ee42b4_13aa53f3":{"bucketDataSourceStats":{"TotStart":1,"TotKick":1,"TotKickDeduped":0,"TotKickOk":1,"TotRefreshCluster":1,"TotRefreshClusterConnectBucket":1,"TotRefreshClusterConnectBucketErr":0,"TotRefreshClusterConnectBucketOk":1,"TotRefreshClusterBucketUUIDErr":0,"TotRefreshClusterVBMNilErr":0,"TotRefreshClusterKickWorkers":2,"TotRefreshClusterKickWorkersClosed":0,"TotRefreshClusterKickWorkersStopped":0,"TotRefreshClusterKickWorkersOk":2,"TotRefreshClusterStopped":0,"TotRefreshClusterAwokenClosed":0,"TotRefreshClusterAwokenStopped":0,"TotRefreshClusterAwokenRestart":0,"TotRefreshClusterAwoken":1,"TotRefreshClusterAllServerURLsConnectBucketErr":0,"TotRefreshClusterDone":0,"TotRefreshWorkers":2,"TotRefreshWorkersVBMNilErr":0,"TotRefreshWorkersVBucketIDErr":0,"TotRefreshWorkersServerIdxsErr":0,"TotRefreshWorkersMasterIdxErr":0,"TotRefreshWorkersMasterServerErr":0,"TotRefreshWorkersRemoveWorker":0,"TotRefreshWorkersAddWorker":1,"TotRefreshWorkersKickWorker":2,"TotRefreshWorkersCloseWorker":0,"TotRefreshWorkersLoop":2,"TotRefreshWorkersLoopDone":0,"TotRefreshWorkersDone":0,"TotWorkerStart":1,"TotWorkerDone":0,"TotWorkerBody":1,"TotWorkerBodyKick":1,"TotWorkerConnect":1,"TotWorkerConnectErr":0,"TotWorkerConnectOk":1,"TotWorkerAuth":0,"TotWorkerAuthErr":0,"TotWorkerAuthFail":0,"TotWorkerAuthOk":0,"TotWorkerUPROpenErr":0,"TotWorkerUPROpenOk":1,"TotWorkerAuthenticateMemcachedConn":1,"TotWorkerAuthenticateMemcachedConnErr":0,"TotWorkerAuthenticateMemcachedConnOk":1,"TotWorkerClientClose":0,"TotWorkerClientCloseDone":0,"TotWorkerTransmitStart":1,"TotWorkerTransmit":4776,"TotWorkerTransmitErr":0,"TotWorkerTransmitOk":4776,"TotWorkerTransmitDone":0,"TotWorkerReceiveStart":1,"TotWorkerReceive":6240101,"TotWorkerReceiveErr":0,"TotWorkerReceiveOk":6240100,"TotWorkerReceiveDone":0,"TotWorkerSendEndCh":0,"TotWorkerRecvEndCh":0,"TotWorkerHandleRecv":5335,"TotWorkerHandleRecvErr":0,"TotWorkerHandleRecvOk":5335,"TotWorkerCleanup":0,"TotWorkerCleanupDone":0,"TotRefreshWorker":2,"TotRefreshWorkerDone":0,"TotRefreshWorkerOk":2,"TotUPRDataChange":6234765,"TotUPRDataChangeStateErr":0,"TotUPRDataChangeMutation":6232627,"TotUPRDataChangeDeletion":2138,"TotUPRDataChangeExpiration":0,"TotUPRDataChangeErr":0,"TotUPRDataChangeOk":6234765,"TotUPRCloseStream":0,"TotUPRCloseStreamRes":0,"TotUPRCloseStreamResStateErr":0,"TotUPRCloseStreamResErr":0,"TotUPRCloseStreamResOk":0,"TotUPRStreamReq":171,"TotUPRStreamReqWant":171,"TotUPRStreamReqRes":171,"TotUPRStreamReqResStateErr":0,"TotUPRStreamReqResFail":0,"TotUPRStreamReqResFailNotMyVBucket":0,"TotUPRStreamReqResFailERange":0,"TotUPRStreamReqResFailENoMem":0,"TotUPRStreamReqResRollback":0,"TotUPRStreamReqResRollbackStart":0,"TotUPRStreamReqResRollbackErr":0,"TotUPRStreamReqResWantAfterRollbackErr":0,"TotUPRStreamReqResKick":0,"TotUPRStreamReqResSuccess":171,"TotUPRStreamReqResSuccessOk":171,"TotUPRStreamReqResFLogErr":0,"TotUPRStreamEnd":0,"TotUPRStreamEndStateErr":0,"TotUPRStreamEndKick":0,"TotUPRSnapshot":982,"TotUPRSnapshotStateErr":0,"TotUPRSnapshotStart":982,"TotUPRSnapshotStartErr":0,"TotUPRSnapshotOk":982,"TotUPRNoop":1440,"TotUPRControl":3,"TotUPRControlErr":0,"TotUPRBufferAck":426,"TotWantCloseRequestedVBucketErr":0,"TotWantClosingVBucketErr":0,"TotSelectBucketErr":0,"TotHandShakeErr":0,"TotGetVBucketMetaData":1324,"TotGetVBucketMetaDataUnmarshalErr":0,"TotGetVBucketMetaDataErr":0,"TotGetVBucketMetaDataOk":1324,"TotSetVBucketMetaData":1153,"TotSetVBucketMetaDataMarshalErr":0,"TotSetVBucketMetaDataErr":0,"TotSetVBucketMetaDataOk":1153,"TotPingTimeout":5761,"TotPingReq":2739,"TotPingReqDone":2739}
2018-11-09T21:53:01.869+01:00 [INFO] managerStats: {"feeds":{"fts_search_35ed824958ee42b4_13aa53f3":{"bucketDataSourceStats":{"TotStart":1,"TotKick":1,"TotKickDeduped":0,"TotKickOk":1,"TotRefreshCluster":1,"TotRefreshClusterConnectBucket":1,"TotRefreshClusterConnectBucketErr":0,"TotRefreshClusterConnectBucketOk":1,"TotRefreshClusterBucketUUIDErr":0,"TotRefreshClusterVBMNilErr":0,"TotRefreshClusterKickWorkers":2,"TotRefreshClusterKickWorkersClosed":0,"TotRefreshClusterKickWorkersStopped":0,"TotRefreshClusterKickWorkersOk":2,"TotRefreshClusterStopped":0,"TotRefreshClusterAwokenClosed":0,"TotRefreshClusterAwokenStopped":0,"TotRefreshClusterAwokenRestart":0,"TotRefreshClusterAwoken":1,"TotRefreshClusterAllServerURLsConnectBucketErr":0,"TotRefreshClusterDone":0,"TotRefreshWorkers":2,"TotRefreshWorkersVBMNilErr":0,"TotRefreshWorkersVBucketIDErr":0,"TotRefreshWorkersServerIdxsErr":0,"TotRefreshWorkersMasterIdxErr":0,"TotRefreshWorkersMasterServerErr":0,"TotRefreshWorkersRemoveWorker":0,"TotRefreshWorkersAddWorker":1,"TotRefreshWorkersKickWorker":2,"TotRefreshWorkersCloseWorker":0,"TotRefreshWorkersLoop":2,"TotRefreshWorkersLoopDone":0,"TotRefreshWorkersDone":0,"TotWorkerStart":1,"TotWorkerDone":0,"TotWorkerBody":1,"TotWorkerBodyKick":1,"TotWorkerConnect":1,"TotWorkerConnectErr":0,"TotWorkerConnectOk":1,"TotWorkerAuth":0,"TotWorkerAuthErr":0,"TotWorkerAuthFail":0,"TotWorkerAuthOk":0,"TotWorkerUPROpenErr":0,"TotWorkerUPROpenOk":1,"TotWorkerAuthenticateMemcachedConn":1,"TotWorkerAuthenticateMemcachedConnErr":0,"TotWorkerAuthenticateMemcachedConnOk":1,"TotWorkerClientClose":0,"TotWorkerClientCloseDone":0,"TotWorkerTransmitStart":1,"TotWorkerTransmit":4783,"TotWorkerTransmitErr":0,"TotWorkerTransmitOk":4783,"TotWorkerTransmitDone":0,"TotWorkerReceiveStart":1,"TotWorkerReceive":6240108,"TotWorkerReceiveErr":0,"TotWorkerReceiveOk":6240107,"TotWorkerReceiveDone":0,"TotWorkerSendEndCh":0,"TotWorkerRecvEndCh":0,"TotWorkerHandleRecv":5342,"TotWorkerHandleRecvErr":0,"TotWorkerHandleRecvOk":5342,"TotWorkerCleanup":0,"TotWorkerCleanupDone":0,"TotRefreshWorker":2,"TotRefreshWorkerDone":0,"TotRefreshWorkerOk":2,"TotUPRDataChange":6234765,"TotUPRDataChangeStateErr":0,"TotUPRDataChangeMutation":6232627,"TotUPRDataChangeDeletion":2138,"TotUPRDataChangeExpiration":0,"TotUPRDataChangeErr":0,"TotUPRDataChangeOk":6234765,"TotUPRCloseStream":0,"TotUPRCloseStreamRes":0,"TotUPRCloseStreamResStateErr":0,"TotUPRCloseStreamResErr":0,"TotUPRCloseStreamResOk":0,"TotUPRStreamReq":171,"TotUPRStreamReqWant":171,"TotUPRStreamReqRes":171,"TotUPRStreamReqResStateErr":0,"TotUPRStreamReqResFail":0,"TotUPRStreamReqResFailNotMyVBucket":0,"TotUPRStreamReqResFailERange":0,"TotUPRStreamReqResFailENoMem":0,"TotUPRStreamReqResRollback":0,"TotUPRStreamReqResRollbackStart":0,"TotUPRStreamReqResRollbackErr":0,"TotUPRStreamReqResWantAfterRollbackErr":0,"TotUPRStreamReqResKick":0,"TotUPRStreamReqResSuccess":171,"TotUPRStreamReqResSuccessOk":171,"TotUPRStreamReqResFLogErr":0,"TotUPRStreamEnd":0,"TotUPRStreamEndStateErr":0,"TotUPRStreamEndKick":0,"TotUPRSnapshot":982,"TotUPRSnapshotStateErr":0,"TotUPRSnapshotStart":982,"TotUPRSnapshotStartErr":0,"TotUPRSnapshotOk":982,"TotUPRNoop":1442,"TotUPRControl":3,"TotUPRControlErr":0,"TotUPRBufferAck":426,"TotWantCloseRequestedVBucketErr":0,"TotWantClosingVBucketErr":0,"TotSelectBucketErr":0,"TotHandShakeErr":0,"TotGetVBucketMetaData":1324,"TotGetVBucketMetaDataUnmarshalErr":0,"TotGetVBucketMetaDataErr":0,"TotGetVBucketMetaDataOk":1324,"TotSetVBucketMetaData":1153,"TotSetVBucketMetaDataMarshalErr":0,"TotSetVBucketMetaDataErr":0,"TotSetVBucketMetaDataOk":1153,"TotPingTimeout":5771,"TotPingReq":2744,"TotPingReqDone":2744}

#7

@treo, to answer your two questions first:

  • We do not offer a way to manually compact files on disk.
  • The size of a pindex and replica are expected to be somewhat similar but that depends on if and when the merger’s run on the zap files in each of the pindexes.

The bug that I mentioned earlier - where the merger lags, isn’t a consistently seen issue. I believe you’d see more uniform sizes with the fix that is going to make it into 6.0.1.

All the pindex stats are dumped into the log files periodically, so I’ve a few questions on the stat snapshot you have highlighted above:

  • Does it belong to the index whose zap files show the huge difference in the sizes?
  • Are you looking at the latest snapshot of the stats?

If you’d be willing - we could take a look at the logs for you. You’ll need to run the cbcollect_info command either on the logs collection tab on the UI or from a CLI, and share these logs.zip here.


#8

I forgot to hit send on my reply^, and looks like @sreeks had already answered you’re previous questions :slight_smile:

@treo We’ll be able to better assist you if we could look into the logs as well. I’ve shared the instructions on how to collect them in my previous message


#9

Agreed!
Before glancing into any of your logs , another sizing related aspect coming to mind is about the index type mappings you might have used. Please try to index/ provide a mapping for the fields which you need to be searchable and make sure to turn off the default mapping in the index definition as this would have undesirable effect on index size.

Sreekanth,


#10

Thank you so much for the advice @sreeks .
I have reviewed it: all our FTS indexes only mapping fields which we need to be searchable/storable.
We allways disable “include_term_vectors” in all text field
I allways turn off the default mapping and any dinamic fields.

Example:

 meta().id: "photos::10000001"
{
  "pho_id": 10000001,
  "pho_file": "80a4e55bf8798647e26234ab2e8da8af1.jpg",
  "pho_sou_id": 5,
  "pho_title": "Nec pretium nunc lorem. Elit non, diam massa, neque consectetuer.",
  "pho_description": "FILE PHOTO: TLorem ipsum dolor sit amet, et vulputate sapien libero leo quis, metus ipsum ut nunc, tellus amet at aenean maecenas tempor, venenatis lorem, phasellus semper lacinia nunc eleifend ultrices. Mi consectetuer, tempor cursus, lacinia elit ultricies. Proin sit amet nonummy vestibulum vehicula. Nec pretium nunc lorem. Elit non, diam massa, neque consectetuer. Switzerland May 18, 2018. Picture taken May 18, 2018. AGENT Name Surname File Photo",
  "pho_keywords": "",
  "pho_creation_date": "2018-05-24 14:40:48",
  "pho_author": "Name Surname",
  "pho_original_name": "2018-05-24T124048Z_562818715_RC15C86AD870_RTRMADP_3_REATSE_PHOTO.JPG",
  "pho_insert_date": "2018-05-24 14:41:13",
  "entitytype": "photos"
}

Index definition:

 {
    "name": "fts_photos",    
    "params": {
      "doc_config": {
        "docid_prefix_delim": "::", <----  Only index meta().id = "photo::xxxxxxxxxxx"
        "docid_regexp": "",
        "mode": "docid_prefix",
        "type_field": "type"
      },
      "mapping": {
        "analysis": {},
        "default_analyzer": "es",
        "default_datetime_parser": "dateTimeOptional",
        "default_field": "_all",
        "default_mapping": { <------
          "dynamic": true,
          "enabled": false  <-------   OFF DEFAULT DYNAMIC MAPPING
        },
        "default_type": "_default",
        "docvalues_dynamic": true,     
        "index_dynamic": true,
        "store_dynamic": false,
        "type_field": "_type",
        "types": {
          "photos": {     <----- Only index meta().id = "photo::xxxxxxxxxxx"
            "dynamic": false,  <----- OFF DINAMIC 
            "enabled": true,
            "properties": {
              "pho_description": {
                "dynamic": false,    <-------
                "enabled": true,
                "fields": [
                  {
                    "include_in_all": true,
                    "index": true,
                    "name": "pho_description",
                    "type": "text"
                  }
                ]
              },
              "pho_file": {
                "dynamic": false, <------
                "enabled": true,
                "fields": [
                  {
                    "analyzer": "keyword",
                    "index": true,
                    "name": "pho_file",
                    "type": "text"
                  }
                ]
              },
              "pho_id": {
                "dynamic": false,  <--------
                "enabled": true,
                "fields": [
                  {
                    "include_term_vectors": true,
                    "index": true,
                    "name": "pho_id",
                    "store": true,
                    "type": "number"
                  }
                ]
              },
              "pho_keywords": {
                "dynamic": false, <-----------
                "enabled": true,
                "fields": [
                  {
                    "include_in_all": true,
                    "index": true,
                    "name": "pho_keywords",
                    "type": "text"
                  }
                ]
              },
              "pho_sou_id": {
                "dynamic": false, <---------------
                "enabled": true,
                "fields": [
                  {
                    "include_term_vectors": true,
                    "index": true,
                    "name": "pho_sou_id",
                    "store": true,
                    "type": "number"
                  }
                ]
              },
              "pho_title": {
                "dynamic": false, <----------
                "enabled": true,
                "fields": [
                  {
                    "include_in_all": true,
                    "index": true,
                    "name": "pho_title",
                    "type": "text"
                  }
                ]
              }
            }
          }
        }
      },
      "store": {
        "indexType": "scorch",  <-------------
        "kvStoreName": ""
      }
    },
    "planParams": {
      "maxPartitionsPerPIndex": 171,
      "numReplicas": 1
    },
    "sourceName": "Test",
    "sourceParams": {},
    "sourceType": "couchbase",
    "type": "fulltext-index"
  }

One last question:
How could I check that there are no fields in the index that I don’t want to index?
I’m trying the “cbft-bleve” cli command but it doesn’t return information :frowning:

data/@fts$ /opt/couchbase/bin/cbft-bleve scorch info fts_places_27dc801239cb56d7_13aa53f3.pindex/store/

Thanks


#11

Give a try with “/opt/couchbase/bin/cbft-bleve fields
ns_server/data/n_0/data/@fts/FTS_PARTITION_PATH.pindex/” should help with the fields indexed.


#12
$ /opt/couchbase/bin/cbft-bleve fields "/opt/couchbase/var/lib/couchbase/data/@fts/fts_places_27dc801239cb56d7_13aa53f3.pindex/"

or:

/opt/couchbase/bin/cbft-bleve scorch info "/opt/couchbase/var/lib/couchbase/data/@fts/fts_places_27dc801239cb56d7_13aa53f3.pindex/store"

Never end execution. No data return :frowning:

“strace -f -s 1024” is in a infinite loop

[pid 21333] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=3000}, NULL <unfinished ...>
[pid 21336] futex(0x1368080, FUTEX_WAIT, 0, {tv_sec=3, tv_nsec=733190846} <unfinished ...>
[pid 21335] <... futex resumed> )       = 0
[pid 21334] <... epoll_wait resumed> [], 128, 0) = 0
[pid 21335] flock(3, LOCK_SH|LOCK_NB <unfinished ...>
[pid 21333] <... pselect6 resumed> )    = 0 (Timeout)
[pid 21335] <... flock resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 21333] futex(0x1368ef8, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 21335] futex(0x1368080, FUTEX_WAKE, 1 <unfinished ...>
[pid 21334] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=20000}, NULL <unfinished ...>
[pid 21336] <... futex resumed> )       = 0
[pid 21335] <... futex resumed> )       = 1
[pid 21336] sched_yield( <unfinished ...>
[pid 21335] epoll_wait(4,  <unfinished ...>
[pid 21336] <... sched_yield resumed> ) = 0
[pid 21335] <... epoll_wait resumed> [], 128, 0) = 0
[pid 21336] futex(0x1368060, FUTEX_WAKE, 1 <unfinished ...>
[pid 21335] futex(0xc420032938, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 21336] <... futex resumed> )       = 0
[pid 21336] futex(0xc420032938, FUTEX_WAKE, 1 <unfinished ...>
[pid 21334] <... pselect6 resumed> )    = 0 (Timeout)
[pid 21334] pselect6(0, NULL, NULL, NULL, {tv_sec=0, tv_nsec=20000}, NULL <unfinished ...>
[pid 21336] <... futex resumed> )       = 1
[pid 21335] <... futex resumed> )       = 0
[pid 21335] epoll_wait(4,  <unfinished ...>
[pid 21336] futex(0x1368080, FUTEX_WAIT, 0, {tv_sec=0, tv_nsec=49852945} <unfinished ...>
[pid 21335] <... epoll_wait resumed> [], 128, 0) = 0
[pid 21335] futex(0xc420032938, FUTEX_WAIT, 0, NULL <unfinished ...>

#13

Not sure of the issue here, the first command works for me. Will check that. @abhinav any clues here?

Another potential size contributing factor in scorch indexes compared to that of solr/lucene could be that for the docValues. Scorch always stores the docValues and there is no configurability around this.

We have noted this and will make it configurable in coming releases.
Can you confirm whether your solr indexes have docValues enabled or disabled?

thanks,


#14

@sreeks , I confirm that in SolR/Lucene we are not using docValues (we do not use function queries or faceting). Could this be the reason why Lucene indexes occupy 10 times less than Scorch?


#15

@treo, this could be one contributing factor. But that can’t account for a 10X amplification. Lucene has optimised storage techniques further in many aspects, for eg: numeric fields.

Just to be clear on the exact size difference you observe, the 200GB scorch index on SolR becomes around 20GB? :slight_smile:


#16

The sum of the 12 indexes (cores) in SolR occupy 20GB on disk (1 single server).
The same 12 indexes in Scorch occupy 200GB (double, 400GB in Moss). Evidently double if we add 1 replicas (400GB)

We use numerical fields in SolR (and no DocValues), this could be a reason for space optimization in SolR, but I think it is not the main reason for space amplification because there are indices/cores where we do not use numerics.

2 examples of comparing size and index definition: SolR vs Scorch (Same indexed fields, no dinamic files, no defauld_mapping) :

Example 1.- Photo index. 4,6GB in SolR vs 25GB in Scorch :frowning:

SolR Sizes:

ls -lh photo/data/index/
total 4,6G    <---------------------------------------
 2,1G nov  6 03:12 _4h1k2.fdt
 1,2M nov  6 03:12 _4h1k2.fdx
  843 nov  6 03:15 _4h1k2.fnm
 866M nov  6 03:15 _4h1k2_Lucene41_0.doc
 448M nov  6 03:15 _4h1k2_Lucene41_0.pos
 1,1G nov  6 03:15 _4h1k2_Lucene41_0.tim
  13M nov  6 03:15 _4h1k2_Lucene41_0.tip
  62M nov  6 03:15 _4h1k2.nvd
  101 nov  6 03:15 _4h1k2.nvm
  447 nov  6 03:15 _4h1k2.si
  102 nov 11 03:10 segments_44umt
   20 nov 11 03:10 segments.gen
    0 jul 19 02:38 write.lock

Index definition:

$ cat photo/conf/schema.xml 
<?xml version="1.0" ?>

<schema name="core photo" version="1.1">
        <types>
                <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100">
                        <analyzer>
                                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                                <filter class="solr.LowerCaseFilterFactory"/>
                                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
                        </analyzer>
                </fieldType>
                <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
        </types>
        <fields>   
                <field name="pho_description" type="text_es" indexed="true" stored="true" multiValued="false" />
                <field name="pho_file" type="string" indexed="true" stored="true" multiValued="false" /> 
                <field name="pho_id" type="int" indexed="true" stored="true" multiValued="false" />
                <field name="pho_keywords" type="text_es" indexed="true" stored="true" multiValued="false" />
                <field name="pho_sou_id" type="int" indexed="true" stored="true" multiValued="false" /> 
                <field name="pho_title" type="text_es" indexed="true" stored="true" multiValued="false" />
        </fields>
        <uniqueKey>pho_id</uniqueKey>
        <defaultSearchField>pho_id</defaultSearchField>
        <solrQueryParser defaultOperator="AND"/>
</schema>

Scorch definition

{
  "type": "fulltext-index",
  "name": "fts_photos",
  "sourceType": "couchbase",
  "sourceName": "ECCache",
  "planParams": {
    "maxPartitionsPerPIndex": 171,
    "numReplicas": 0
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "::",
      "docid_regexp": "",
      "mode": "docid_prefix",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "es",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": false,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": true,
      "index_dynamic": false,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "photos": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "pho_description": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "index": true,
                  "name": "pho_description",
                  "type": "text"
                }
              ]
            },
            "pho_file": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "keyword",
                  "index": true,
                  "name": "pho_file",
                  "type": "text"
                }
              ]
            },
            "pho_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "pho_id",
                  "store": true,
                  "type": "number"
                }
              ]
            },
            "pho_keywords": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "index": true,
                  "name": "pho_keywords",
                  "type": "text"
                }
              ]
            },
            "pho_sou_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "pho_sou_id",
                  "store": true,
                  "type": "number"
                }
              ]
            },
            "pho_title": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "index": true,
                  "name": "pho_title",
                  "type": "text"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch",
      "kvStoreName": ""
    }
  },
  "sourceParams": {}
}

Example 2.- Comments index. 0,711GB in SolR vs 35GB in Scorch :frowning:
A lot of numeric fields. But only 1 field stored: “com_id”.

SolR sizes:

$ls -lh comments/data/index/
total 711M     <---------------------- 0,711GB
  15M oct 18 03:10 _o78j5.fdt
 2,5K oct 18 03:10 _o78j5.fdx
 1,1K oct 18 03:10 _o78j5.fnm
 291M oct 18 03:10 _o78j5_Lucene41_0.doc
 161M oct 18 03:10 _o78j5_Lucene41_0.pos
 237M oct 18 03:10 _o78j5_Lucene41_0.tim
 5,0M oct 18 03:10 _o78j5_Lucene41_0.tip
 3,6M oct 18 03:10 _o78j5.nvd
   46 oct 18 03:10 _o78j5.nvm
  447 oct 18 03:10 _o78j5.si
   20 nov 11 03:10 segments.gen
  102 nov 11 03:10 segments_mzb3j
    0 jul 19 02:38 write.lock

Index definition:

$cat comments/schema.xml 
<?xml version="1.0" ?>
<schema name="core comments" version="1.1">
        <types>
                <fieldType name="text_es" class="solr.TextField" positionIncrementGap="100">
                        <analyzer>
                                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                                <filter class="solr.LowerCaseFilterFactory"/>
                                <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
                        </analyzer>
                </fieldType>
                <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
        </types>
        <fields>   
            <field name="com_allow" type="boolean" indexed="true" stored="false" multiValued="false" />
            <field name="com_date" type="tdate" indexed="true" stored="false" multiValued="false" /> 
            <field name="com_edi_id" type="int" indexed="true" stored="false" multiValued="false" /> 
            <field name="com_id" type="int" indexed="true" stored="true" multiValued="false" />
            <field name="com_moderator" type="int" indexed="true" stored="false" multiValued="false" />
            <field name="com_offensive" type="int" indexed="true" stored="false" multiValued="false" />
            <field name="com_parent_id" type="int" indexed="true" stored="false" multiValued="false" />
            <field name="com_sit_id" type="int" indexed="true" stored="false" multiValued="false" />
            <field name="com_text" type="text_es" indexed="true" stored="false" multiValued="false" />
            <field name="com_use_id" type="int" indexed="true" stored="false" multiValued="false" />
        </fields>
        <uniqueKey>com_id</uniqueKey>
        <defaultSearchField>com_id</defaultSearchField>
        <solrQueryParser defaultOperator="AND"/>
</schema>

Scorch definition

{
  "type": "fulltext-index",
  "name": "fts_comments",
  "uuid": "204c9c7ffce339bf",
  "sourceType": "couchbase",
  "sourceName": "ECCache",
  "sourceUUID": "64b9ddf252efd18db70fcd107641a505",
  "planParams": {
    "maxPartitionsPerPIndex": 171,
    "numReplicas": 0
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "::",
      "docid_regexp": "",
      "mode": "docid_prefix",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "es",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": false,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": true,
      "index_dynamic": false,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "comments": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "com_allow": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "com_allow",
                  "type": "number"
                }
              ]
            },
            "com_date": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "com_date",
                  "type": "datetime"
                }
              ]
            },
            "com_edi_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "com_edi_id",
                  "type": "number"
                }
              ]
            },
            "com_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "name": "com_id",
                  "store": true,
                  "type": "number"
                }
              ]
            },
            "com_moderator": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "com_moderator",
                  "type": "number"
                }
              ]
            },
            "com_offensive": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "com_offensive",
                  "type": "number"
                }
              ]
            },
            "com_parent_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "com_parent_id",
                  "type": "number"
                }
              ]
            },
            "com_sit_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "com_sit_id",
                  "type": "number"
                }
              ]
            },
            "com_text": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "include_in_all": true,
                  "index": true,
                  "name": "com_text",
                  "type": "text"
                }
              ]
            },
            "com_use_id": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "index": true,
                  "name": "com_use_id",
                  "type": "number"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch",
      "kvStoreName": ""
    }
  },
  "sourceParams": {}
}

#17

Thanks @treo for the details.

If your requirement is limited to field scoped queries, then you may untick the " include in _all field" option for the fields indexed. This could save some space as well.

We haven’t yet benchmarked scorch sizing with respect to that of lucene, so may be your findings are hints for further improvements. Certainly this helps in prioritising the tasks at our end.

thanks,


#18

My queries are limited to field scoped :slight_smile: No default queries using “include_in_all” . I untick this feature.

To test/benchmark the influence of adding “include_in_all”, I have recreated the same index with 3 options:

  • 0 “include_in_all” fields tick: 75GB
  • 1 “include_in_all” fields tick: 76GB
  • 8 “include_in_all” fields tick: 120GB (some .zap files are not compact)

#19

Hi – still scanning /scrolling through this long thread, and one thing popped out (not sure if it was already mentioned)…

A lot of numeric fields.

I think I was seeing a lot of id fields being indexed as type “number” in your sample index definition. Indexing as a number is useful if you’ve got real numeric use cases happening, like range searches, like “search for all the product docs that are about ‘iphone’ and whose price < 1000”

But many apps don’t actually perform range searches on id fields but only do exact keyword equality searches on id fields, so I often find it better to instead use text type and keyword analyzer instead of “number” for these kinds of id field situations, which can save on index size. (Again, if this was already mentioned somewhere on the thread, apologies!)


#20

Thank you so much for the advice @steve

You are absolutely right!, many of our fields are “number” because we search for this style:
“search for all the product docs that are about ‘iphone’ and active=1 and category=25 and enable=1 and type=33”

Effectively we don’t usually do numerical searches like category<25

We chose numerical fields because we have it in our DB model and because we thought it would be more efficient in FTS (Possibly I was wrong).

Now, I change some numeric fields to text field and … disk index size reduced from 76GB to 40GB :slight_smile: :slight_smile: :slight_smile:

Do you think it will also improve search performance? :slight_smile:

Tomorrow I will test the performance and analyze my applications querys to see if I can change from numerical to text without affecting the application.

Do you have any document/blog/post of good practices about creating FTS indexes in Couchbase?
Thanks!!