Index scan timeout on query

query

#1

Hello,

I am having an issue trying to figure out why my query is timing out. Here is the query:

SELECT COUNT(*)
FROM `sync_gateway` AS docs
WHERE owner IN (
    SELECT RAW owner FROM `sync_gateway` as owners WHERE type="userprofile_private" and emailConfirmed=true and lastSignInDate <= "2018-10-05T00:00:00Z"
)

I have an index on each of the fields in the where clause as well as a composite index on all of the fields in the where clause (added whilst trying to figure out why it was timing it).

The Explain looks like this:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "IndexScan3",
        "as": "docs",
        "covers": [
          "cover ((`docs`.`owner`))",
          "cover ((meta(`docs`).`id`))"
        ],
        "index": "owner-id",
        "index_id": "a387bda3ca329527",
        "index_projection": {
          "entry_keys": [
            0
          ]
        },
        "keyspace": "sync_gateway",
        "namespace": "default",
        "spans": [
          {
            "range": [
              {
                "inclusion": 0,
                "low": "null"
              }
            ]
          }
        ],
        "using": "gsi"
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "Filter",
              "condition": "(cover ((`docs`.`owner`)) in (select raw (`owners`.`owner`) from `sync_gateway` as `owners` where ((((`owners`.`type`) = \"userprofile_private\") and ((`owners`.`emailConfirmed`) = true)) and ((`owners`.`lastSignInDate`) <= \"2018-10-05T00:00:00Z\"))))"
            },
            {
              "#operator": "InitialGroup",
              "aggregates": [
                "count(*)"
              ],
              "group_keys": []
            }
          ]
        }
      },
      {
        "#operator": "IntermediateGroup",
        "aggregates": [
          "count(*)"
        ],
        "group_keys": []
      },
      {
        "#operator": "FinalGroup",
        "aggregates": [
          "count(*)"
        ],
        "group_keys": []
      },
      {
        "#operator": "Parallel",
        "~child": {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "InitialProject",
              "result_terms": [
                {
                  "expr": "count(*)"
                }
              ]
            },
            {
              "#operator": "FinalProject"
            }
          ]
        }
      }
    ]
  },
  "text": "SELECT COUNT(*)\nFROM `sync_gateway` AS docs\nWHERE owner IN (\n    SELECT RAW owner FROM `sync_gateway` as owners WHERE type=\"userprofile_private\" and emailConfirmed=true and lastSignInDate <= \"2018-10-05T00:00:00Z\"\n)"
}

I have also tried to do this query as a self join rather than a WHERE IN and I get the same result, the error is as follows:

[
  {
    "code": 5000,
    "msg": " Index scan timed out - cause:  Index scan timed out",
    "query_from_user": "SELECT COUNT(*)\nFROM `sync_gateway` AS docs\nWHERE owner IN (\n    SELECT RAW owner FROM `sync_gateway` as owners WHERE type=\"userprofile_private\" and emailConfirmed=true and lastSignInDate <= \"2018-10-05T00:00:00Z\"\n)"
  }
]

Our CB has approx. 28m documents, 5 nodes, 50GB RAM dedicated to this bucket (6GB dedicated to the index service) run CB 6.0.0 CE.

Any ideas or pointers?


#2

You are doing whole scan of the bucket. What exactly you are trying to do.


#3

My goal is to get a count of the documents where the owner attribute of the document matches those in the where clause, i.e. users who have not signed in within the last 6 months but have a confirmed email address


#4

SELECT RAW owner FROM sync_gateway as owners WHERE type=“userprofile_private” and emailConfirmed=true and lastSignInDate <= “2018-10-05T00:00:00Z”
If this is small number
do 2 queries and pass the subquery output as query parameter and run the parent query has adhoc=true
Otherwise it needs to scan whole index.


#5

It is not a small number unfortunately. Is there another way to achieve this?


#6

In that case plan is optimal only option you have increase index timeout. which is 2min at present. cc @deepkaran.salooja