Restore, indexes and "mutations remaining" statistics

LeonidGvirtz · February 26, 2019, 3:23pm

Hi

I have a number of indexes on a bucket. When I restore the bucket using cbbackupmgr, I see that at the end of the restoring process, all documents are restored and the value of “mutations remaining” statistics for indexes is nearly equal to the number of documents in the bucket. Then, the value of the “mutations remaining” steadily decreases to zero, while indexer on cluster nodes work hard. I suppose that I just observe Couchbase updating indexes after the bucket restore. If this is the case, does it mean that indexes are nearly useless right after the finish of the bucket restore, and they are actually become useful when “mutations remaining”? If it is correct, is there an easy way to monitor the update of indexes and to know when it is completely over? I understand that it is possible using cbstats, but is there a more convenient way?

Thanks in advance

amit.kulkarni · February 28, 2019, 10:43am

Hi @LeonidGvirtz,

May I know which version of couchbase server are you using? Also, may I know the exact steps you did? Are you restoring the indexes to the same cluster or different cluster?

Index backup/restore happens only for index metadata. In other words, during backup, only index definitions are captured. During restore, indexes have to be re-created and re-built with the help of definitions captured during backup. Index will be useless until the index is completely built. The build progress of the indexes can be seen on couchbase server GUI (Indexes tab).

For more information on index restore using cbbackupmgr, please visit

https://docs.couchbase.com/server/6.0/backup-restore/cbbackupmgr-restore.html

Alternative way of getting the index build progress is by using /getIndexStatus REST endpoint. Value of “progress” in the response denotes the build progress. For details please visit:

https://docs.couchbase.com/server/6.0/rest-api/get-status-indexes.html

LeonidGvirtz · February 28, 2019, 11:50am

My Couchbase version is 5.5.2 and I meant the situation when indexes are already in place before the restore. Thank you for the link to getIndexStatus, I understand that if indexes are not dropped before the restore then it is the way to go.

amit.kulkarni · March 4, 2019, 12:45pm

Hi @LeonidGvirtz,

If the restore is attempted on the same cluster, then there are two possibilities:

If the index with same name and same definition exists on the same bucket, then index creation should get skipped during restore (as there is no need to create/build already existing index). In this case, I don’t see possibility of index unavailability for non-consistent scans. So, if your index is unavailable for non-consistent scans in this scenario, there can be some other issue. Please collect the logs and attach here so that I can debug it further.
Here, for consistent scans the situation is different. Restore of all the Key-Value data will create new events on Key-Value store. These new events have to be indexed before the index service can serve consistent scans. So, if you are seeing index being unavailable for consistent scans, then it is expected behaviour.
Note: In consistent scan case, getIndexStatus won’t help you as index build for the existing index is already done. So, you will always see progress as 100%.
If index with same name but different definition exists on the same bucket, then the new index with different name will be created during the restore. This new index has to be built explicitly, till then it remains unavailable.

Note: To collect logs from couchbase server UI, click on “Logs” => “Collect Information” => “Start Collection”.

Thanks.

LeonidGvirtz · March 7, 2019, 1:41pm

Hi

I collected the logs as you have advised, but I see that it quite heavy, about 1GB compressed. Is it possible to collect just a useful subset of that data somehow?

In the meanwhile, I think that I have figured out the cause of the problem at least partly. I see that a partitioned index on an attribute of a bucket with 32M documents is distributed across the nodes with the ratio of 20M:8M:4M. The replica of the same index is distributed at the ratio of 16M:12M:4M. I use “Indexed items” statistics from the UI. After rebuilding of the index, I see that the ratio became 12M:12M:8M, while the index building was significantly slower on one of the nodes. Note, that we use UUIDs as document keys, so I would have rather expected the index to be uniformly distributed across the nodes. Am I missing something?

amit.kulkarni · March 12, 2019, 6:13am

Hi @LeonidGvirtz,

Can you please upload indexer.log files from all the nodes? After looking at the logs, I can better explain the root cause of the problems, if any.

From the other topics you have created

what I can see is, partition distribution (3, 3, 2) looks fine to me (as mentioned by @prathibha). With partition distribution (3, 3, 2), the ratio 12M:12M:8M seems like a uniform distribution among the partitions.

LeonidGvirtz · March 13, 2019, 2:53pm

Hi Amit

I tried to investigate the problem more and discovered that the distribution of index partitions across nodes of the cluster is very much uneven and unpredictable, and this is the primary cause of the problem, probably there other factors too. Even when I increased the number of partitions to 9 in a hope to get even distribution on all nodes (3:3:3), the actual distribution fluctuates and it is quite far from being even, see a fragment of getIndexStatus output attached.

Also, I attach indexer.log from all three nodes. These logs cover the index creation. Note that the behavior is not the same for all indexes. There is another index that was distributed unevenly and rebuilding with 9 partitions resulted in the even distribution of the index. So, I just wonder what the expected behavior is?

In addition, I see some error messages in indexer.log and I’m not sure how to interpret it. Could you look at it please?indexer_logs.zip (3.1 MB)
indexer_logs.zip (3.1 MB)

amit.kulkarni · March 19, 2019, 11:42am

Hi @LeonidGvirtz,

As per the logs attached, I can see some indexes with non-uniform partition counts. But across all the indexes, partition counts is fairly uniform (node1: 35 partitions, node2: 45 partitions, node3: 41 partitions). So, overall resource consumption on these nodes will be fairly uniform.

The partition distribution algorithm has to consider a lot of factors before it decides final partition distribution. Some of these factors are index memory footprint, size of RAM installed on the indexer nodes, available memory on indexer nodes, presence of non-partitioned indexes (and resources required by those non-partitioned indexes), index replicas ( for HA) etc.

The index partition distribution is decided at the time of index creation and these logs will be present in query.log files. Can you please attach query.log files from all nodes? Make sure that the logs at the time of index creation are captured. query.log will shed more light on the reason behind non-uniform distribution.

Thanks.