Low document read performance when the number of filters in the query is ~500


We work in a multinational company that produces diesel engines and is working on an IoT platform to analyze engine performance based on sensor data. We are starting to perform tests with Couchbase to evaluate performance of queries required in our platform. We have initially inserted 300 thousand documents that look like this in a bucket.

  "timestamp": 1590045757860,
  "uuid1": "991aadac-2486-474b-8818-0dc337763e5a",
  "filterL1Uuid": "6c444c52-f1cf-4f80-b10e-6d8a385acd84",
  "filterL2Uuid": "15b6061d-96de-4345-b367-1e77155547ae",
  "name": "Sensor 1",
  "state": 1,
  "severity": 1,
  "category": 1

Approximately 3% of documents have state = 1.

Bucket name: my-bucket

Index: CREATE INDEX my-bucket-index ON my-bucket(filterL1Uuid) WHERE state = 1;

Couchbase Cluster
Two nodes (4 cpu, 16 GB, Amazon Linux 2, 25 GB Disk, Community Edition 6.5.1 build 6299)

We performed testing of below N1QL Query using Couchbase Java SDK (3.0.10) via a REST endpoint.

SELECT * FROM my-bucket
WHERE state = 1 
AND filterL1Uuid IN [ <filterL1Uuid1>, <filterL1Uuid2>, ... ]
AND filterL2Uuid IN [ <filterL2Uuid1>, <filterL2Uuid2>, ... ]

Average number of filterL1Uuid’s in above query is 50.
Average number of filterL2Uuid’s in above query is 500.

Test results are summarized below.

Test 1
#Thread: 25
Test Duration (min): 5 min
#Requests: 27286
CPU (avg | max): 68.84 | 70.55
Memory (avg | max): 3.16 | 3.18
Response Time (avg | max): 0.26 | 0.81 second
Throughput (rps): 90.95 requests per second

Test 2
#Thread: 100
Test Duration (min): 5 min
#Requests: 27678
CPU (avg | max): 65.47 | 68.15
Memory (avg | max): 3.17 | 3.20
Response Time (avg | max): 1.09 | 2.30 second
Throughput (rps): 91.21 requests per second

In writing above mentioned documents to Couchbase we achieved an excellent throughput of ~2800 requests per second.

However, in reading from Couchbase for above mentioned query the throughput is very low at ~90 requests per second though above mentioned index is created on the bucket.

Please help in understanding the low document read performance when the number of filters in the query is ~500. Do you have any suggestions to improve the read performance.

Thank you.

CE cluster with 2 nodes , 4 CPU which runs all the services, each one of service competes resources each other.
In Addition in CE query services limits 4 cores.

In addition your query using 3 filters out of that index has only 2 of them 3rd one need to applied after fetching the document,
IN clause of 50 elements means indexScan must be done 50 distinct ranges. and apply 500 in clause post fetch of the document.
If index uses 3 filters on two IN clauses it becomes 50*500*.5 = 62500 combinations. IndexScan on that many distinct values?

Request per sec depends on how many keys indexscan gave, how many documents fetched, how many further filtered , size of results. Each stage if you can minimize the unnecessary work it will be better. The following links details those.
Try with EE and links provided in Identify top N queries in couchbase

Also checkout FTS Service/search API from JAVA and get document keys and Fetch the documents