Index Normalization


#1

I know this is a bit of a stretch, but I was wondering if I could get a link to the code that handles index normalization? I’m working on this project to help manage indexes in our CI/CD pipeline:

When we create an index, CB is normalizing the form for the “index_key” and “condition” attributes returned when you query system:indexes. This means that if we don’t make our index definition exactly match the normalized form then the index appears different. I’d really like to be able to duplicate the normalization logic so I can more accurately compare to see if the index definition is different from the actual index.

Thanks,
Brant


#2

Nevermind on this one, found the solution. Running an EXPLAIN CREATE INDEX ... query will return a query plan which has the keys and condition normalized. This is a great solution since it guarantees consistent normalization patterns even across different versions of Couchbase. The only minor difference is the keys array in the plan is an array of strings pre-5.0 and is an array of objects with an expr attribute post-5.0.


#3

Hi @btburnett3,

In 5.0 onwards Index keys can have DESC collation, Format has changed to array of objects to incorporate this change.
Example: CREATE INDEX ix1 ON default(k1 DESC, k2, k3DESC);


#4

Ahhh, I’ll need to make sure I account for that as well, then. Thanks for the tip @vsr1.


#5

Hi @btburnett3,

Also In 5.5.0 There is PARTITION BY HASH(…)
https://blog.couchbase.com/couchbase-gsi-index-partitioning/

Also checkout this it will play key role in the index advisory
https://blog.couchbase.com/understanding-index-grouping-aggregation-couchbase-n1ql-query/


#6

Yeah, I’m working on that one right now! https://github.com/brantburnett/couchbase-index-manager/issues/25


#7

@vsr1

Actually, I do have one question about partitioned indexes and replicas. I can’t seem to create replicas appear effective in terms of their node assignments. If I assign nodes, both replicas use all of the nodes listed. If I don’t assign nodes, both replicas use all nodes in the cluster. This isn’t really providing redundancy against node failure.

I’ve experimented with alternative syntax to try to control replica node assignment with more granularity, but haven’t had any luck. Any pointers? Or is this something that isn’t expected until 5.5 GA?

Thanks,
Brant


#8

cc @deepkaran.salooja


#9

With partitioning, redundancy is provided by placing the replicas of a partition on different nodes. This doesn’t exclude other partitions/replicas from being co-located. At an index level, you’ll see the index being placed on all assigned nodes.

If a node goes down, an equivalent replica copy of that partition will be chosen to answer queries.


#10

@deepkaran.salooja Okay, I think that makes sense to me. Basically replicas are provided using a slightly different methodology for partitioned indexes. And num_replica is effectively decoupled from nodes when creating the index.