Simple user search takes 25+ seconds to execute on large cluster with fairly small data set

We’ve been using N1QL for a new game we’ve been developing. However when the application went to load testing. N1QL blew up.

We generated 100,000 users, which totals 900,000 user recored. And then there is additional data which is stored in different buckets as the load testing occurs. But we have barely broke the 20,000 entries for these buckets.

The cluster setup is a 3 node cluster on AWS with 30GB RAM each. (90GB total in cluster).
With couchbase 4.0 community
With 4GB allocated to indexing (12GB total)
And 20GB(60GB total) allocated to the buckets which are split up as follows:
Feed 15GB, replica 1, 10119 items, 94.5MB / 15GB used
Performance 30GB, replica 1, 15205 items, 102MB / 30GB used
User 12GB, replica 1, 900001 items, 505MB / 12GB used

When we look at the system stats we hover around 60-80% memory and never swap.
However the indexer and cbq-engine are consuming most of the memory.

The query we are trying to run is the following:
SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;

What we found was composite indexes we’re pointless. Only the first entry in the composite index was actually being used. And if we tried to use the 2nd or 3rd entry we would get a Need a global index error. Also performance was super shotty.

So we tried to index each conditional parameter separately and this sped things up a bit. However using LIKE for string comparison even on null values slowed it down to 2-3 seconds. And ORDER BY will always cause it to take 25+ seconds. In fact if any one of the optimisations wasn’t done, it would take 25+ seconds.

Here’s more data as I was trying to optimise:

ORIGINAL QUERY
CREATE INDEX index_user ON user(userId , name, consoleUserId) WHERE meta().id LIKE “user/%”;

SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;
28s

Minimum Condition Query
SELECT userId, avatarId, name FROM user WHERE userId <> “” AND meta().id LIKE “user/%” LIMIT 100;
45ms

With OFFSET
SELECT userId, avatarId, name FROM user WHERE userId <> “” AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
43ms

With name condition (name indexed)
SELECT userId, avatarId, name FROM user WHERE userId <> “” AND name LIKE “%” AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
28s

With createdAt condition (createdAt not indexed)
SELECT userId, avatarId, name FROM user WHERE userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
26s

With ORDER BY
SELECT userId, avatarId, name FROM user WHERE userId <> “” AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;
27s

Create separate name INDEX
CREATE INDEX index_user_name ON user(name) WHERE meta().id LIKE “user/%”;

SELECT userId, avatarId, name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
37ms

SELECT userId, avatarId, name FROM user WHERE name IS NULL AND userId <> “” AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
122ms

SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
5s

SELECT userId, avatarId, name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;
28s

Create separate createdAt INDEX
CREATE INDEX index_user_createdAt ON user(createdAt) WHERE meta().id LIKE “user/%”;

SELECT userId, avatarId, name FROM user WHERE userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
422ms

SELECT userId, avatarId, name FROM user WHERE userId <> “” AND createdAt <= 99999999999999999 AND name LIKE “%” AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
1.3s

Make all properties separate indexes
CREATE INDEX index_user_userId ON user(userId) WHERE meta().id LIKE “user/%”;
CREATE INDEX index_user_name ON user(name) WHERE meta().id LIKE “user/%”;
CREATE INDEX index_user_createdAt ON user(createdAt) WHERE meta().id LIKE “user/%”;
CREATE INDEX index_user_consoleUserId ON user(consoleUserId) WHERE meta().id LIKE “user/%”;

SELECT userId, avatarId, name FROM user WHERE userId <> “” AND createdAt <= 99999999999999999 AND name IS NULL AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
692ms

SELECT userId, avatarId, name FROM user WHERE userId <> “” AND createdAt <= 99999999999999999 AND name LIKE “%” AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
1.5s

SELECT userId, avatarId, name FROM user WHERE userId <> “” AND createdAt <= 99999999999999999 AND name IS NULL AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;
28s

Create specific composite index for this query
DROP INDEX user.index_user_userId;
DROP INDEX user.index_user_name;
DROP INDEX user.index_user_createdAt;
CREATE INDEX index_user_searchName ON user(userId, name, createdAt) WHERE meta().id LIKE “user/%”;

SELECT userId, avatarId, name FROM user WHERE userId <> “” AND name LIKE “%” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” LIMIT 100 OFFSET 0;
28s

Hi Almir,

My apologies for the issues you are seeing with N1QL. Many performance improvements have gone into 4.1 and 4.5. I realize you want community edition, but I would strongly recommend that you first design and test with 4.5. That will give you a much more realistic of where N1QL performance is today. After that, you can decide how to proceed (e.g. try to tune 4.0 or 4.1, use 4.5 EE, or build 4.5 from source).

You should also be doing EXPLAIN for each of your queries, to verify that the indexes are being used in the expected manner.

Hi @geraldss,

Thanks for your responses and apologies on the late reply on my part. I was discussing a few things with @ingenthr and then Golden Week hit (week long public holiday over here in Tokyo).

To give you an update on this, we tested this out with version “4.5.0-2203” and here is what we got:

  • composite indexes still do not work. It seems that only the first property is being searched. To avoid this we have created separate indexes per property conditional
  • Significant performance different between checking data type (LIKE % is actually faster than a NULL check)
  • ORDER BY is significantly faster
  • Performance impacts change depending on the data type we try and compare (NOT empty string check is slower than % wildcard)
  • As expected there was no performance impact with ORDER BY when there is nothing to order, e.g. values are NULL
  • Still have performance impacts when comparison data type is changed

I will put the full analysis below. Also I do use explain and my indexes are being used accordingly. For now I wont paste them here. However if you need them, please let me know and I’ll put them up.

ORIGINAL QUERY 4.5 User
100,000 Users
SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;

Search specific index
CREATE INDEX index_user_searchName ON user(userId, name, createdAt) WHERE meta().id LIKE “user/%”;

SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10 OFFSET 0;
13s

NOTES:

  • composite indexes still do not work. It seems that only the first property is being searched. To avoid this we have created separate indexes per property conditional

Individual indexes for each condition
CREATE INDEX index_user_userId ON user(userId) WHERE meta().id LIKE “user/%”;
CREATE INDEX index_user_name ON user(name) WHERE meta().id LIKE “user/%”;
CREATE INDEX index_user_createdAt ON user(createdAt) WHERE meta().id LIKE “user/%”;
CREATE INDEX index_user_consoleUserId ON user(consoleUserId) WHERE meta().id LIKE “user/%”;

SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;
1.7s (0 results)

SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 1000 OFFSET 0;
1.7s (0 results)

SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10000 OFFSET 0;
1.7s (0 results)

SELECT userId, avatarId, name FROM user WHERE name IS NULL AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;
18s (100 results)

SELECT userId, avatarId, name FROM user WHERE name IS NULL AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 1000 OFFSET 0;
18s (1000 results)

SELECT userId, avatarId, name FROM user WHERE name IS NULL AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10000 OFFSET 0;
18s (10000 results)

NOTES:

  • Significant performance different between checking data type (LIKE % is actually faster than a NULL check)

userId
SELECT userId FROM user WHERE userId IS NULL AND meta().id LIKE “user/%” LIMIT 100;
9ms (0 results)

SELECT userId FROM user WHERE userId IS NULL AND meta().id LIKE “user/%” LIMIT 1000;
9ms (0 results)

SELECT userId FROM user WHERE userId IS NULL AND meta().id LIKE “user/%” ORDER BY userId ASC LIMIT 100;
9ms (0 results)

SELECT userId FROM user WHERE userId IS NULL AND meta().id LIKE “user/%” ORDER BY userId ASC LIMIT 1000;
9ms (0 results)

SELECT userId FROM user WHERE userId <> “” AND meta().id LIKE “user/%” LIMIT 100;
13ms (100 results)

SELECT userId FROM user WHERE userId <> “” AND meta().id LIKE “user/%” LIMIT 1000;
48ms (1000 results)

SELECT userId FROM user WHERE userId <> “” AND meta().id LIKE “user/%” LIMIT 10000;
490ms (10000 results)

SELECT userId FROM user WHERE userId <> “” AND meta().id LIKE “user/%” ORDER BY userId ASC LIMIT 100;
1.5s (100 results)

SELECT userId FROM user WHERE userId <> “” AND meta().id LIKE “user/%” ORDER BY userId ASC LIMIT 1000;
1.6s (1000 results)

SELECT userId FROM user WHERE userId <> “” AND meta().id LIKE “user/%” ORDER BY userId ASC LIMIT 10000;
2.2s (10000 results)

SELECT userId, avatarId , name FROM user WHERE userId LIKE “%” AND meta().id LIKE “user/%” LIMIT 100;
45ms (100 results)

SELECT userId, avatarId , name FROM user WHERE userId LIKE “%” AND meta().id LIKE “user/%” LIMIT 1000;
250ms (1000 results)

SELECT userId, avatarId , name FROM user WHERE userId LIKE “%” AND meta().id LIKE “user/%” LIMIT 10000;
1.9s (10000 results)

SELECT userId, avatarId , name FROM user WHERE userId LIKE “%” AND meta().id LIKE “user/%” ORDER BY userId ASC LIMIT 100;
45ms (100 results)

SELECT userId, avatarId , name FROM user WHERE userId LIKE “%” AND meta().id LIKE “user/%” ORDER BY userId ASC LIMIT 1000;
300ms (1000 results)

SELECT userId, avatarId , name FROM user WHERE userId LIKE “%” AND meta().id LIKE “user/%” ORDER BY userId ASC LIMIT 10000;
1.8s (10000 results)

NOTES:

  • ORDER BY is significantly faster
  • Performance impacts change depending on the data type we try and compare (NOT empty string check is slower than % wildcard)

name
SELECT name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” LIMIT 100;
12ms (100 results)

SELECT name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” LIMIT 1000;
45ms (1000 results)

SELECT name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” LIMIT 10000;
450ms (10000 results)

SELECT name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100;
12ms (100 results)

SELECT name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 1000;
45ms (1000 results)

SELECT name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10000;
450ms (10000 results)

SELECT userId, avatarId , name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” LIMIT 100;
30ms (100 results)

SELECT userId, avatarId , name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” LIMIT 1000;
180ms (1000 results)

SELECT userId, avatarId , name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” LIMIT 10000;
1.5s (10000 results)

SELECT userId, avatarId , name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100;
30ms (100 results)

SELECT userId, avatarId , name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 1000;
180ms (1000 results)

SELECT userId, avatarId , name FROM user WHERE name IS NULL AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10000;
1.5s (10000 results)

SELECT userId, avatarId , name FROM user WHERE name LIKE “%” AND meta().id LIKE “user/%” LIMIT 100;
100ms (0 results)

SELECT userId, avatarId , name FROM user WHERE name LIKE “%” AND meta().id LIKE "user/%"LIMIT 1000;
100ms (0 results)

SELECT userId, avatarId , name FROM user WHERE name LIKE “%” AND meta().id LIKE “user/%” LIMIT 10000;
100ms (0 results)

SELECT userId, avatarId , name FROM user WHERE name LIKE “%” AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100;
100ms (0 results)

SELECT userId, avatarId , name FROM user WHERE name LIKE “%” AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 1000;
100ms (0 results)

SELECT userId, avatarId , name FROM user WHERE name LIKE “%” AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10000;
100ms (0 results)

NOTES:

  • As expected there was no performance impact with ORDER BY when there is nothing to order, e.g. values are NULL
  • Still have performance impacts when comparison data type is changed

name + userId
SELECT userId, avatarId , name FROM user WHERE name LIKE “%” AND userId <> “” AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10;
1.6s

name + userId + createdAt
SELECT userId, avatarId, name FROM user WHERE name IS NULL AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10;
18s

SELECT userId, avatarId, name FROM user WHERE name IS NULL AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10 OFFSET 0;
18s

SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10;
1.7s

SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 10 OFFSET 0;
1.4s

The following query is trying to get almost all of the data from the bucket that is why it taking long time. As @geraldss suggested we have done quite performance improvements in 4.5 Beta.

Please try the following in 4.5 Beta.

CREATE INDEX index_user ON user(name,userId,createdAt,avatarId) WHERE meta().id LIKE “user/%”;
SELECT userId, avatarId, name FROM user WHERE name LIKE “%” AND userId <> “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;

Above query uses ORDER BY, LIMIT, OFFSET. If query has these it needs to process all the qualified results, order before applying limit, offset. In 4.5 Beta if the query order matches the index leading orders we eliminate sort (explain will show no order operator), In that case query can stop producing the results when the limit and offset satisfied (In 4.5 Beta some cases we pushed limit/offset to indexer).

If the fields are strings, try the following variation. Also check your results are right.

SELECT userId, avatarId, name FROM user WHERE name >= “” AND userId > “” AND createdAt <= 99999999999999999 AND meta().id LIKE “user/%” ORDER BY name ASC LIMIT 100 OFFSET 0;

FYI: name LIKE “%” is not same as name IS NULL. So the timings of tests will change based on predicates.

@vsr1 i think you missed the first follow up post which states that we are now using the latest version of 4.5 build available thanks to one of your colleagues over there.

Also the fact that it “apparently” gets all the data in the bucket is irrelevant, especially when there are only 100,000 entries in that specific index and no more than 900,000 entries in the entire bucket (800,000 should be ignored because of the index)

I think you’ve missed the part where I mentioned that composite indexes are not working even in 4.5. If you look at the analysis of 4.5.0-2203, the composite index does not provide any performance benefits on indexed properties after the first one specified, and it is more performance to create a separate index per property.

Name and userId are actually wildcard searches which could return everything or a subset. I expect subsets to be faster, but still the everything with limits should still be decently performant.

I assumed as such after the analysis, however NULL checks being slower than LIKE “%” was quite a shock indeed. I think the NULL check algorithm can be improved since it should be the same as an INT check which does perform quite well.

Hi @AlmirKadric,

Can we send you a pre-GA build to try out? If yes, please indicate OS / platform and version.

@AlmirKadric’s platform is CentOS 6.

@geraldss sure we’d be happy to try out a pre-GA build.
@ingenthr would this be different to the build you sent me?
P.S. our platform is technically CentOS 7.2 not 6 ;p

Ah, my apologies. The major version of CentOS does matter in this case.

Yes, I believe @geraldss is proposing that we supply you a later build with some changes. The code is, of course, changing all the time and build 2203 is one that received a bit of extra scrutiny before labeling it beta.

Hi Almir, sent you one-off build. Let us know how that goes… thanks.

1 Like

@prasad @geraldss @vsr1 just thought I would update this thread with the latest info. Thanks for the latest beta builds, we were able to execute a successful load test run. Though we believe the performance could be further optimised the queries are performing much better now.

Thanks a lot for the help, we can continue the discussion of general performance through our other channels, but for now this thread as it is, is now resolved :+1:

1 Like