Interaction between Prepared Statement and Duplicate Indexes


#1

Hi,
We are having the following setup and I would like to understand a little bit how all of this could interact together…

In order to have availability of our indexes, we are duplicating them across different nodes… The only difference between duplicates is that we are adding the ip of the node at the end of the index name.

We are seeing that creating the plan for a query seems to take longer and longer and be based on the number of indexes on the bucket we are targeting for the query - is this expected, is there ways of improving this? Using USE INDEX in the query is not an option as this mean the client will have to know about the cluster and were the index reside…

Most of the time, we are using prepared statement, so my guess is that we pay only once the cost of creating the plan… But I am wondering how this behave with duplicates indexes for HA? Does the prepared statement plan refer to a specific index? If this is the case, what will happen if the node having this index disappear? Will the Java SDK recreate the plan against one of the remaining index that can fulfil the query? So basically by using prepared statement, we are gaining on the execution plan creation time, but we are losing the load balancing of the use of the indexes between nodes?

Can someone explain how all of this interact and what is the recommended way to use multiple duplicate indexes for HA?

Many thanks.


#2

Having multiple truly identical indexes is not a good idea. You’re just increasing the load on the system to keep it updated.

If you are concerned that a GSI index will overload a node because that index is very busy, you can create partitioned indexes by adding a WHERE clause in the CREATE INDEX clause like this. If the criterion in the WHERE clause is found in the query, the query engine should use the partial index rather than the more general one.

CREATE INDEX defABfilterC ON default(a, b) WHERE c=5


#3

Hi,

I can understand that duplicating indexes increase the load, but do you recommend to only have 1 node only containing the index, therefore accepting downtime for queries if the only node having the index is down?

So even if we shard indexes, I still think we would like to duplicate, at least once, in order to ensure HA!


#4

Hi @lbertrand and @johan_larson,

Duplicate indexes are fine, and are in fact our HA solution. We recommend you put duplicate indexes on separate nodes. Regardless of what is in the query plan or the prepared statement, the indexer runtime is aware of duplicate indexes and will use a duplicate index for HA / failover (and I believe for load balancing as well).

Sharding is orthogonal to this. You can also duplicate sharded indexes.

Gerald