CB 5.5 - What happens when index partion hashes are updated?

Eugaia · March 31, 2018, 12:22pm

Hi,

I’m in the process of planning a database setup where I would like to partition the indexes, since it will be available in CB 5.5. There are many complexities to the setup, and rather than just doing index partitioning based on the type of the document, I would instead prefer to put a partitioning ID on each document. In general, the intention is that these partition IDs do not change, and if they do it would be rare.

Would I be correct in assuming that if whatever values are used to partition the indexes, that if those values are changed, that the indexes are re-distributed, but that queries are always processed normally?

Thanks.

vsr1 · March 31, 2018, 1:31pm

At present the partition keys should be immutable.

Choosing Partition Keys
Partition keys can be one or more fields or an expression of one or more fields representing the partition key for the partitioning, for example:

The document key META().id
Any single or multiple immutable field name in the defined in index keys (field name in the document)
A function on the index key fields, such as LOWER(), LEAST(), GREATEST(), SUBSTR(), etc.
A complex formula on the index key fields combining functions and operators.

Indexer will not enforce immutability of the index key. If you DELETE and re-insert it may be able to take care of that @jliang will able to answer that.

Eugaia · March 31, 2018, 2:07pm

@vsr1

Thanks for replying.

I understand that they ‘should’ be immutable, and that is indeed the plan (such keys would not change more than once per year, say).

I’m just trying to work out if the keys changed whether I’d need to rebuild the index (or delete and re-insert the data), or whether it would be re-hashed automatically.

Thanks!

vsr1 · March 31, 2018, 8:22pm

If few documents hash keys changing you can delete and re-insert.
If there are lot of hash keys are changing you can drop the index, change hash keys and re-create the index.
Currently there is no automatically re-hash mechanism.

Eugaia · March 31, 2018, 8:24pm

@vsr1

Many thanks for clearing that one up.

ks900 · October 10, 2019, 5:53pm

@vsr1 Was just reading this thread. If the partitioning element is changing frequently. And we are performing deletes than inserts, should their only be the latest copy of the data in the index? We are noticing that even on delete then insert we still see the index hanging on to an older set of data.

vsr1 · October 10, 2019, 5:59pm

PARTITION keys must be immutable. If those changing use some other immutable keys or do META().id

ks900 · October 10, 2019, 6:02pm

We are using the partitioned element as a main component in our query. Is it okay to not have our leading index field not be the field which we are partitioning on?

vsr1 · October 10, 2019, 6:04pm

Partition key can be any key as long as it is present in the index. If there is no predicate it will use scatter gather and may not use some optimizations.

ks900 · October 10, 2019, 10:34pm

So question. I’ve spoken to some solutions engineers at couchbase. They’ve said that if I delete the record then insert the record again it should remove the old data in the index only keeping the latest version. My question is on how data is handled across data centers on couchbase clusters. We have multiple data centers with the same cluster set up. Both have access to the same records. If I delete data from via the java sdk is it deleted from both clusters in both data centers from the data and index nodes?

vsr1 · October 11, 2019, 12:53am

If the same cluster and replica nodes it should remove on all replicas irrespective of the data centers . Only restriction is don’t do partition index on mutable keys and do update (delete and insert is ok). cc @deepkaran.salooja

deepkaran.salooja · October 11, 2019, 4:25pm

Delete and insert of the record should remove the old data from the index. For reference, this is explained in the documentation:
https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/index-partitioning.html#partition-keys

I assume the couchbase clusters are connected via XDCR. Once the delete/insert propagates to the other data centre, the indexes would also get updated. Indexes keep in sync with the data present in the cluster.