Rather fast subqueries make original query really slow even with empty data sets


#1

We’ve been using N1QL for a new game we’ve been developing. However when the application went to load testing. N1QL blew up.

We generated 100,000 users, which totals 900,000 user recored. And then there is additional data which is stored in different buckets as the load testing occurs. But we have barely broke the 20,000 entries for these buckets.

The cluster setup is a 3 node cluster on AWS with 30GB RAM each. (90GB total in cluster).
With couchbase 4.0 community
With 4GB allocated to indexing (12GB total)
And 20GB(60GB total) allocated to the buckets which are split up as follows:
Feed 15GB, replica 1, 10119 items, 94.5MB / 15GB used
Performance 30GB, replica 1, 15205 items, 102MB / 30GB used
User 12GB, replica 1, 900001 items, 505MB / 12GB used

When we look at the system stats we hover around 60-80% memory and never swap.
However the indexer and cbq-engine are consuming most of the memory.

The query we are trying to run is the following:
SELECT DISTINCT p.pid FROM performance p JOIN performance ps ON KEYS “performanceStats/pid:” || p.pid JOIN user u ON KEYS “user/userId:” || p.userId WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) ORDER BY p.createdAt DESC LIMIT 50;

Building on top of the findings inside: Simple user search takes 25+ seconds to execute on large cluster with fairly small data set

We sped things up a bit by avoiding composite indexes, however any string comparison slows things down to 1-3 seconds. But what we found to be an additional problem was rather fast subqueries (2ms) would slow down the original query quite a fair bit(7s) even with empty data sets.


#2

Original Query
SELECT DISTINCT p.pid FROM performance p JOIN performance ps ON KEYS “performanceStats/pid:” || p.pid JOIN user u ON KEYS “user/userId:” || p.userId WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) ORDER BY p.createdAt DESC LIMIT 50;
11s

Without Joins
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) LIMIT 50;
11s

Separate each conditional property into an INDEX
CREATE INDEX index_performance_pid ON performance(pid) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_userId ON performance(userId) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_isEnabled ON performance(isEnabled) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_isDeleted ON performance(isDeleted) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_expiredAt ON performance(expiredAt) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_createdAt ON performance(createdAt) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_shareTarget ON performance(shareTarget) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_shareUserIds ON performance(shareUserIds) WHERE meta().id LIKE “performance/%”;

Without Share Targets
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 LIMIT 50;
1.5s

With userId check
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7”) LIMIT 50;
1.4s

With public share target
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public”) LIMIT 50;
1.8s

With shared userIds share target
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds)) LIMIT 50;
1.9s

With friendIds share target
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId))) LIMIT 50;
7s

With All Share Targets (followerIds)
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) LIMIT 50;
13s

With Joins
SELECT DISTINCT p.pid FROM performance p JOIN performance ps ON KEYS “performanceStats/pid:” || p.pid JOIN user u ON KEYS “user/userId:” || p.userId WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) LIMIT 50;
14s

Share Target Sub Queries Only
SELECT followerIds FROM user USE KEYS “follow/userId:04a43a75-65c3-4840-b250-d0da7bc13ff7”;
2.7ms

SELECT friendIds FROM user USE KEYS “friend/userId:04a43a75-65c3-4840-b250-d0da7bc13ff7”;
2.7ms


#3

There is a performance bug in 4.0 with subqueries. Even though the subquery plan is cached, the subquery is being planned again due to the bug. Was fixed in 4.1 and beyond. Apologies again.


#4

Hi @geraldss,

Thanks for your responses and apologies on the late reply on my part. I was discussing a few things with @ingenthr and then Golden Week hit (week long public holiday over here in Tokyo).

To give you an update on this, we tested this out with version “4.5.0-2203” and here is what we got:

  • query is significantly faster without any modifications
  • not much of an impact with separate indexes on this one
  • each string comparison adds quite a performance hit
  • though significantly faster, sub-queries still add quite a performance hit,
    roughly 500ms for each sub-query even though the sub-queries yield empty results
  • joins have no impact on the performance

I will put the full analysis below. Also I do use explain and my indexes are being used accordingly. For now I wont paste them here. However if you need them, please let me know and I’ll put them up.


#5

ORIGINAL QUERY 4.5 Performance
SELECT DISTINCT p.pid FROM performance p JOIN performance ps ON KEYS “performanceStats/pid:” || p.pid JOIN user u ON KEYS “user/userId:” || p.userId WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) ORDER BY p.createdAt DESC LIMIT 50;
1.04ms (50 results)

Notes:

  • query is significantly faster without any modifications

Separate Indexes

CREATE INDEX index_performance_pid ON performance(pid) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_userId ON performance(userId) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_isEnabled ON performance(isEnabled) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_isDeleted ON performance(isDeleted) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_expiredAt ON performance(expiredAt) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_createdAt ON performance(createdAt) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_shareTarget ON performance(shareTarget) WHERE meta().id LIKE “performance/%”;
CREATE INDEX index_performance_shareUserIds ON performance(shareUserIds) WHERE meta().id LIKE “performance/%”;

SELECT DISTINCT p.pid FROM performance p JOIN performance ps ON KEYS “performanceStats/pid:” || p.pid JOIN user u ON KEYS “user/userId:” || p.userId WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) ORDER BY p.createdAt DESC LIMIT 50;
1.07s (50 results)

Notes:

  • not much of an impact with separate indexes on this one

Without Share Targets
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 LIMIT 50;
135ms (50 results)

With userId check
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7”) LIMIT 50;
130ms (50 results)

With public share target
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public”) LIMIT 50;
173ms (50 results)

Notes:

  • each string comparison adds quite a performance hit

With shared userIds share target
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds)) LIMIT 50;
180ms (50 results)

With friendIds share target
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId))) LIMIT 50;
600ms (50 results)

Notes:

  • though significantly faster, sub-queries still add quite a performance hit, roughly 500ms even though they yield empty results

With All Share Targets (followerIds)
SELECT DISTINCT p.pid FROM performance p WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) LIMIT 50;
1.06s (50 results)

Notes:

  • though significantly faster, sub-queries still add quite a performance hit, roughly 500ms even though they yield empty results

With Joins
SELECT DISTINCT p.pid FROM performance p JOIN performance ps ON KEYS “performanceStats/pid:” || p.pid JOIN user u ON KEYS “user/userId:” || p.userId WHERE meta§.id LIKE “performance/%” AND p.pid IS NOT MISSING AND p.isEnabled = FALSE AND p.isDeleted = FALSE AND p.expiredAt > 1461345193451 AND p.createdAt >= 1460913193450 AND (p.userId = “04a43a75-65c3-4840-b250-d0da7bc13ff7” OR p.shareTarget = “public” OR (p.shareTarget = “userIds” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN p.shareUserIds) OR (p.shareTarget = “friend” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT friendIds FROM user USE KEYS “friend/userId:” || p.userId)) OR (p.shareTarget = “follower” AND “04a43a75-65c3-4840-b250-d0da7bc13ff7” WITHIN (SELECT followerIds FROM user USE KEYS “follow/userId:” || p.userId))) LIMIT 50;
1.04s (50 results)

Notes:

  • joins have no impact on the performance

Share Target Sub Queries Only
SELECT followerIds FROM user USE KEYS “follow/userId:04a43a75-65c3-4840-b250-d0da7bc13ff7”;
1ms (0 results)

SELECT friendIds FROM user USE KEYS “friend/userId:04a43a75-65c3-4840-b250-d0da7bc13ff7”;
1ms (0 results)


#6

The sub queries uses field form the main query (i.e correlated sub queries). So each document we need to execute sub query even though sub query returns 0 results.

Could you please drop all the indexes on performance and try with following indexes in 4.5 Beta and see which index performs better(this depends on data)

CREATE INDEX index_performance_1 ON performance(isEnabled, isDeleted,createdAt, expiredAt,pid) WHERE meta().id LIKE “performance/%”;

CREATE INDEX index_performance_2 ON performance(createdAt, expiredAt,isEnabled, isDeleted,pid) WHERE meta().id LIKE “performance/%”;

You can check query using right index and spans by doing explain. It should show the following spans for index index_performance_1.

      {
                            "#operator": "IndexScan",
                            "index": "index_performance_1",
                            "index_id": "37288c4ff6117bfa",
                            "keyspace": "default",
                            "namespace": "default",
                            "spans": [
                                {
                                    "Range": {
                                        "High": [
                                            "false",
                                            "successor(false)"
                                        ],
                                        "Inclusion": 0,
                                        "Low": [
                                            "false",
                                            "false",
                                            "1460913193450",
                                            "1461345193451",
                                            "null"
                                        ]
                                    }
                                }
                            ],
                            "using": "gsi"

#7

I hope to god you’re not executing the subquery for each document. If you are then I see why there is quite a performance hit. You should only be executing the sub-query once and then referencing the results as the sub-query results do not change for each document.

Even though I have shown that composite indexes do not work properly in my previous thread: simple-user-search-takes-25-seconds-to-execute-on-large-cluster-with-fairly-small-data-set, I will give your suggestions a go tomorrow to see if there are any impacts on performance with explains attached.


#8

Hi @AlmirKadric,

Your subqueries reference the parent record, i.e. they are correlated subqueries. Therefore the results do change.

If possible, I’d like to get you to try a pre-GA build, and then we can continue to work through your queries together.


#9

Hi @geraldss @vsr1,

Seems I miss read the query and made some assumptions there. You’re absolutely right!

I have just spoken to my engineers to get this corrected. Truth is we have the opposite map as well so I will have them update the query to ensure we only do a single lookup and will let you know if the performance improves and by how much. I will also hold off on the index/explain test until then.


#10

@prasad @geraldss @vsr1 just thought I would update this thread with the latest info. Thanks for the latest beta builds, we were able to execute a successful load test run. Though we believe the performance could be further optimised the queries are performing much better now.

Thanks a lot for the help, we can continue the discussion of general performance through our other channels, but for now this thread as it is, is now resolved :+1: