Max Parallelism

naftali · March 12, 2020, 6:59pm

Looking at the docs:

Along with aggregate pushdown optimization, an application can further enhance the aggregate query performance by computing aggregation in parallel for each partition in the index service. This can be achieved by specifying the parameter max_parallelism when issuing a query. The value for max_parallelism should match the number of partitions of the index Note than when this is enabled, the index service uses more CPU and memory since the query traffic is increased according to the value set in the parameter max_parallelism

Does "when this is enabled […] since the query traffic is increased according to the value set in the parameter max_parallelism " imply that without that setting, the query engine will query the index partitions sequentially?

Meaning that if I have n partitions (on say Meta().id) on n index nodes, the query engine will get results from [1,n] one after the other?

vsr1 · March 12, 2020, 7:06pm

If Partition resides on different indexer node those are scanned in parallel irrespective of max_parallelism.
If more than one partition of index on same node then only it comes into affect. cc @deepkaran.salooja

deepkaran.salooja · March 12, 2020, 7:22pm

In addition to what @vsr1 mentioned, 6.5.0 enables parallel scans across partitions by default. The parameter doesn’t need to be set.