Slow disk drain and create rate

Hello everyone,

I’m currently testing out couchbase enterprise edition using a single node on my local machine running ubuntu 12.04LTS.

The machine specs are:
64bit
32GB RAM
I7 8core
750GB ATA WDC Harddrive

The cluster quota is 8gb and I have one bucket ‘users’ with 4gb assigned to it. My document sizes are on average 250bytes, I have a java application deployed onto tomcat on the same machine, when I do any number of inserts the writes to disk never exceed 100 and average 60. For example the disk write queue will be 25k and a drain rate of 60 to 100.

I have the standard disk write concurrency of 3. Any ideas or info about my problem would be greatly appreciated!

Thanks,

Owen

Hello,

Even if this does not look like good numbers, I would like you to analyze more data, maybe it is the max that your hardware can do.

One thing you can do is to look at iostat to see what the utilization of the disk is. If it’s being heavily used, then that’s probably just as fast as it can go. If this is not used,…, we need to continue the investigation.

I would recommend first increasing the incoming writes to confirm that batching is a strong culprit. And then increasing the workers to 6 should double that rate. I wouldn’t recommend going to 8 on this system since he’s only got 8 cores and should leave ~4 of them for erlang/memcached/etc.

Regards
Tug
@tgrall

Hi @tgrall ,

I have same issues,
I have 2 Gb total mutations, and drain rate only around 1k per second.
It take a very long time to process 1 index.
I have 32 Gb RAM and 10 CPU Core
Is there any way to speed up or increase my drain rate?

Thanks

@Han_Chris1,

The drain rate on an index depends on many factors like resident_ratio, size of the indexed key, IO bandwidth, type of index (E.g., array index vs non-array index), whether you are streaming initial traffic (or) incremental traffic, RSS of indexer process etc. Can you please share the following details:

a. Which version of couchbase-server are you on and what is your cluster setup
b. What is the storage mode
c. What is the resident ratio of the index for which you are seeing only 1k drain rate
d. What is the CPU and RSS of indexer process during this time
e. What is the peak IO bandwidth of the storage device in this case
f. Is the index in build phase (or) is the index already build and catching up incremental traffic

Thanks,
Varun

Hi @varun.velamuri,

Thanks for your response.
Please find below detail:
a. Which version of couchbase-server are you on and what is your cluster setup
[H.C] : couchbase 6.0.1 with 1 cluster containing 3 data ,3 query ,1 eventing ,4index node
b. What is the storage mode
[H.C] : Standard Global Secondary
c. What is the resident ratio of the index for which you are seeing only 1k drain rate
[H.C] : How to check this resident ratio? I can see : 100 cache resident percent and 12.5 cache miss ratio and 6.49 active docs resident %
d. What is the CPU and RSS of indexer process during this time
[H.C] : I have dedicated 1 service in 1 node, so all the resource should be for indexing only
e. What is the peak IO bandwidth of the storage device in this case
[H.C] : How to check this?
f. Is the index in build phase (or) is the index already build and catching up incremental traffic
[H.C] : Actually I’m doing upgrade using swap rebalance. How long normally to process 2Gb mutations?

@Han_Chris1,

couchbase 6.0.1 with 1 cluster containing 3 data

Is this community edition (or) enterprise edition?

Percentage of index data resident in memory.

Cache resident percent is the stat I am looking for. It describes the percent of index that is resident in memory. 100% means that all the index is in memory

What is the CPU and RSS of indexer process during this time

I mean, the CPU and RSS while the drain rate is only 1k mutations/sec. This will help to understand whether all the resources are being fully utilised or not.

What is the peak IO bandwidth of the storage device in this case

You can use iostat utility (E.g., iostat -x -p ALL 1 10 || iostat -x 1 10) This will give the output iostat executed 10 times each with an interval of 1 sec. As all the index is resident in memory, I think this is not required now

How long normally to process 2Gb mutations?

Do you mean 2 billion mutations (or) the total size of all the mutations is 2 giga bytes. Can you share the num_docs_pending and num_docs_indexed stat for this index. You can get these stats in /opt/couchbase/var/lib/couchbase/indexer.log when the drain rate is 1k mutations/sec

Is this community edition (or) enterprise edition?

Enterprise Edition, but I haven’t bought the license, need to do evaluation first

Cache resident percent is the stat I am looking for. It describes the percent of index that is resident in memory. 100% means that all the index is in memory

0.0613 for now, since previous index is done, and now continue to next indexes

I mean, the CPU and RSS while the drain rate is only 1k mutations/sec. This will help to understand whether all the resources are being fully utilised or not.

How to check this? My server is linux red hat
My couchbase instance is using docker and deploy to OCP

You can use iostat utility (E.g., iostat -x -p ALL 1 10 || iostat -x 1 10 ) This will give the output iostat executed 10 times each with an interval of 1 sec. As all the index is resident in memory, I think this is not required now

avg-cpu:
%user | %nice| %system| %iowait| %steal| %idle
18.26 | 0.00 | 3.89 | 0.00 | 0.00 | 77.85

This is current condition now, also processing another index now

Do you mean 2 billion mutations (or) the total size of all the mutations is 2 giga bytes. Can you share the num_docs_pending and num_docs_indexed stat for this index. You can get these stats in /opt/couchbase/var/lib/couchbase/indexer.log when the drain rate is 1k mutations/sec

Yes, 2G means 2bio ?
num_docs_pending : 1221611732
num_docs_indexed : 0

This is sample my current index stats:

“xxx:avg_disk_bps”:1675,“xxx:avg_drain_rate”:0,“xxx:avg_item_size”:0,“xxx:avg_mutation_rate”:0,“xxx:avg_scan_latency”:0,“xxx:avg_scan_rate”:0,“xxx:avg_scan_request_latency”:0,“xxx:avg_scan_wait_latency”:0,“xxx:avg_ts_interval”:600013718783,“xxx:avg_ts_items_count”:0,“xxx:backstore_raw_data_size”:0,“xxx:build_progress”:17,“xxx:cache_hit_percent”:0,“xxx:cache_hits”:0,“xxx:cache_misses”:0,“xxx:client_cancel_errcount”:0,“xxx:data_size”:124534,“xxx:data_size_on_disk”:155668,“xxx:delete_bytes”:0,“xxx:disk_load_duration”:0,“xxx:disk_size”:5349714,“xxx:disk_store_duration”:299,“xxx:flush_queue_size”:0,“xxx:frag_percent”:97,“xxx:get_bytes”:0,“xxx:insert_bytes”:5349376,“xxx:items_count”:0,“xxx:key_size_distribution”:{“(0-64)”:0,“(102401-max)”:0,“(1025-4096)”:0,“(257-1024)”:0,“(4097-102400)”:0,“(65-256)”:0},“xxx:key_size_stats_since”:0,“xxx:last_known_scan_time”:0,“xxx:last_rollback_time”:“1592622862898722075”,“xxx:log_space_on_disk”:5349376,“xxx:memory_used”:16,“xxx:not_ready_errcount”:0,“xxx:num_commits”:8,“xxx:num_compactions”:0,“xxx:num_completed_requests”:0,“xxx:num_docs_indexed”:0,“xxx:num_docs_pending”:1221611732,“xxx:num_docs_processed”:258211218,“xxx:num_docs_queued”:0,“xxx:num_flush_queued”:0,“xxx:num_items_flushed”:0,“xxx:num_items_restored”:0,“xxx:num_last_snapshot_reply”:0,“xxx:num_open_snapshots”:1,“xxx:num_requests”:0,“xxx:num_rows_returned”:0,“xxx:num_rows_scanned”:0,“xxx:num_scan_errors”:0,“xxx:num_scan_timeouts”:0,“xxx:num_snapshot_waiters”:0,“xxx:num_snapshots”:8,“xxx:progress_stat_time”:“1592885655267708829”,“xxx:raw_data_size”:0,“xxx:recs_in_mem”:0,“xxx:recs_on_disk”:0,“xxx:resident_percent”:0,“xxx:scan_bytes_read”:0,“xxx:scan_wait_duration”:0,“xxx:since_last_snapshot”:600018554387,“xxx:total_scan_duration”:0,“xxx:total_scan_request_duration”:0

0.0613 for now

This seems to be very low. Only 6% of the index is resident in memory. So, the index build might be bottlenecked by disk IO

num_docs_indexed : 0

This means that none of the documents are getting indexed. I am not sure how the drain rate is going to 1k/sec. Will it be possible to share the cbcollect logs for the indexer node for further analysis.

You can collect the logs by going to
“logs” tab > click on “collect information” > select nodes - include indexer node > click on “start collecting”.
Wait for some time. This will create an archive with all the debug information needed. Please attach that archive.

Thanks,
Varun

@Han_Chris1, what is the memory quota assigned for the index service out of the 32GB available? By default, index service will only use 512MB. You can change the quota from UI->Settings.

Hi @deepkaran.salooja,
I’ve set index memory quota to 29696Mb since I have 32Gb available in my server for each index node

This seems to be very low. Only 6% of the index is resident in memory. So, the index build might be bottlenecked by disk IO

I can only find residnt keyword for this : 100 active docs resident %

This means that none of the documents are getting indexed. I am not sure how the drain rate is going to 1k/sec. Will it be possible to share the cbcollect logs for the indexer node for further analysis.

Actually how does the index mutation work?
let say I have 1 bucket contains 1bio doc, then I execute build 2 indexes using GSI,
I saw total mutations for that 2 indexes has the same value but different drain rate.
Is it OK if I execute build 20 indexes in 1 time ? since it will generate same total mutation at the same time?

@Han_Chris1,

At high level, a mutation from data service is evaluated against all indexes and encoded for each index. This evaluation happens on data node. The encoded data for all the indexes is sent to index nodes which is then put into storage.

Depending on the storage mode, then mutation is either persisted to disk or kept only in memory or both. Memory-optimized storage mode requires all the indexed data in memory. Standard-global secondary will persist the data into disk. If there is enough memory to keep all the indexed data in memory, then it would keep it in memory (at the same time persisting to disk). Otherwise, it would evict some of the data from memory and put into disk to accommodate new data. With Memory-Optimized, if indexer node restarts, indexer has to rebuild some of the more-recent data (Not from scratch). In your case, you are using Standard-global secondary.

Coming to your question of building 20 indexes at a time, the advantage of doing this is that you don’t have to stream the same document again and again when compared to building 2 indexes in a batch for 10 batches. Now, if your memory is not sufficient to hold all the data in memory, then the data from 20 indexes has to be evicted and pushed on to disk. The disk IO bandwidth, the average item size of index, the number of indexes etc. determines how fast or slow this would be. If this persisting to disk phase a bottleneck, then index build eventually slows down as it would block the pipeline.

Thanks,
Varun

1 Like

Hi @varun.velamuri,

Thanks for your clear explanation.
1 thing I still curious about index mutation,
I saw if we create 1 document, then update many times, in metadata it will show how many revisions, right?
then when I try to create new index in that bucet, I saw the total mutations are include that all revisions.
That’s why from my 1bio docs, can create around 7G mutations.

Is there any way to clean up those revisions / to decrease the total mutations?

Thanks

@Han_Chris1,

Indexing service will index what ever mutations data service sends. So, if you have updated multiple documents and data services holds the multiple versions, then indexer would index all these updates as well.

Regarding cleaning up those revisions, auto-compaction would typically clean-up these updates and keep the updated version of the document. You can probably raise another specific post in the forums asking this question.

Thanks,
Varun

Thanks @varun.velamuri ,

I’ve created separate topic in here:

Need your advise

Thanks,