[4.5 beta] n1ql performance with where in [array] is slow


#1

Hi,

I’m trying to implement a user list with followers in n1ql. I already have it working with views, but trying to see if I can get better performance with n1ql.

Desired output:

user A: 3
user B: 5
user C: 10

Current document structure

{
     "entityType": "activity",
     "target": "user",
     "target_id": "a user id here"
     "action": "follow"
}

Index

create index idx_activity_follow on mybucket (entityType, target, target_id, action);

Getting # of followers for a single user works fine. It takes under 10ms.

select count(target_id) from mybucket where entityType='activity' and target='user' and action='follow' and target_id='user_id'

But getting # of followers for multiple users takes long time. It takes over a minute and eventually displays index scan time out.

select count(target_id) from mybucket where entityType='activity' and target='user' and action='follow' and target_id in ['user_id', 'user_id2']

Explain

{
    "requestID": "c802177d-235b-475e-b8bf-27280a64a56c",
    "signature": "json",
    "results": [
        {
            "plan": {
                "#operator": "Sequence",
                "~children": [
                    {
                        "#operator": "IndexScan",
                        "covers": [
                            "cover ((`mybucket`.`entityType`))",
                            "cover ((`mybucket`.`target`))",
                            "cover ((`mybucket`.`target_id`))",
                            "cover ((`mybucket`.`action`))",
                            "cover ((meta(`mybucket`).`id`))"
                        ],
                        "index": "idx_activity_follow",
                        "index_id": "e88a694859a8a70a",
                        "keyspace": "mybucket",
                        "namespace": "default",
                        "spans": [
                            {
                                "Range": {
                                    "High": [
                                        "successor(\"activity\")"
                                    ],
                                    "Inclusion": 1,
                                    "Low": [
                                        "\"activity\""
                                    ]
                                }
                            }
                        ],
                        "using": "gsi"
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "Filter",
                                    "condition": "((((cover ((`mybucket`.`entityType`)) = \"activity\") and (cover ((`mybucket`.`target`)) = \"user\")) and (cover ((`mybucket`.`action`)) = \"follow\")) and (cover ((`mybucket`.`target_id`)) in [\"kakin\"]))"
                                },
                                {
                                    "#operator": "InitialGroup",
                                    "aggregates": [
                                        "count(cover ((`mybucket`.`target_id`)))"
                                    ],
                                    "group_keys": []
                                }
                            ]
                        }
                    },
                    {
                        "#operator": "IntermediateGroup",
                        "aggregates": [
                            "count(cover ((`mybucket`.`target_id`)))"
                        ],
                        "group_keys": []
                    },
                    {
                        "#operator": "FinalGroup",
                        "aggregates": [
                            "count(cover ((`mybucket`.`target_id`)))"
                        ],
                        "group_keys": []
                    },
                    {
                        "#operator": "Parallel",
                        "~child": {
                            "#operator": "Sequence",
                            "~children": [
                                {
                                    "#operator": "InitialProject",
                                    "result_terms": [
                                        {
                                            "expr": "count(cover ((`mybucket`.`target_id`)))"
                                        }
                                    ]
                                },
                                {
                                    "#operator": "FinalProject"
                                }
                            ]
                        }
                    }
                ]
            },
            "text": "select count(target_id) from mybucket where entityType='activity' and target='user' and action='follow' and target_id in ['kakin']"
        }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "4.180504ms",
        "executionTime": "4.143132ms",
        "resultCount": 1,
        "resultSize": 3935
    }
}

Is there a way to improve it?


#2

For starters, target_id should be the first key in your index. For the other index keys, put the more selective keys first, and the least selective keys last. Your index should be:

create index idx_activity_follow on mybucket (target_id, ( other keys in decreasing order of selectivity ) );

This query will be very fast in 4.5 GA, because it will use an IndexCountScan. Not sure what version you are running on. But please try the above index first.

Thanks.


#3

Hi,

I’m on 4.5 beta.

I’ve created a new index as you suggested. Some of the query works just fine, but some get index timeout. I’m trying to find out what’s causing it. I will report back.

I also totally forgot that I need to group by target_id in order get # of followers for each user.

Is there a way to download 4.5 GA? I only see 4.5.0-BETA on http://www.couchbase.com/nosql-databases/downloads

Thanks,
Moon


#4

Hi Moon, what is the OS/version? I will send you the GA release candidate bits. Btw, how many documents do you have? Can you also check indexer/query logs for any error messages (for the index timeout issue)…
-Prasad


#5

Hi @prasad,

We will be running production version on linux, but I’m testing 4.5 on my mac os for now.

I don’t see any errors at the moment.

Thank you for the reply :slight_smile:


#6

I’m still waiting for a link to GA from prasad, but I found something very strange.

The following query returns super fast.

select count(target_id) from mybucket where target_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)

The following query hang.
select count(target_id) from mybucket where target_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16, 17)

Whenever I pass 16+ ids, it simply hangs. Both of them have the same explain output.


#7

Hi @moon0326,

That is fixed in GA. And I think you meant square brackets instead of parentheses.


#8

Good to know!

Oh yeah :slight_smile: brackets.

I will try GA once I get a link to it.

Thanks!
Moon