I have a large(ish) database, say 10M+, where I want to find documents in which some field is equal (ie find all documents in which field X has arbitrary value Y)
I have created indexes on the relevant fields by
CREATE INDEX `idx_covering_breed` ON `bucket`((meta().`id`),`breed`) WHERE ((meta().`id`) like "animal:%")
and are attempting to get the result by
SELECT d.breed, ARRAY_AGG(META(d).id) AS targets
FROM `bucket` AS d
WHERE META().id LIKE 'animal:%'
GROUP BY d.breed
HAVING COUNT(1) > 1
You might try (in addition to your idx_covering_breed index):
CREATE INDEX idx_breed ON `bucket`(`breed`);
And run:
SELECT meta(d2).id
FROM ( SELECT d1.breed
FROM `bucket` AS d1
WHERE d1.breed IS VALUED
AND meta(d1).id LIKE 'animal:%'
GROUP BY d1.breed
HAVING COUNT(1) > 1
) AS breeds
JOIN `bucket` d2 ON d2.breed = breeds.breed ;
The aim is to be able to stream the results rather than have to hold all keys in memory (your ARRAY_AGG).
CREATE INDEX `idx_covering_breed` ON `default`(`breed`) WHERE meta().`id` like "animal:%";
SELECT RAW META(d2).id
FROM ( SELECT d1.breed
FROM default AS d1
WHERE d1.breed IS VALUED AND meta(d1).id LIKE 'animal:%'
GROUP BY d1.breed
HAVING COUNT(1) > 1
) AS breeds
JOIN default d2 USE HASH (PROBE) ON d2.breed = breeds.breed AND meta(d2).id LIKE 'animal:%';