Query taking time even after using covering index (Couchbase 4.6.4 EE)

Hi,
This query is taking 4-5 seconds for offset 55541.
When offset is 0, then it takes 15 ms.

Select propertyId,startDate,endDate,status,statusChangedDate,portCode,locationId,alerts,firstName,lastName,birthDate,
citizenshipCountryCode,genderCode,identifications,reservationNumber,phones,phone,stateroom,teamMemberNumber,photoMediaItemId,
securityPhotoMediaItemId,profilePhotoMediaItemId,personTypeCode
From reporting_v1
Where type = ‘PersonMovements’
And meta().id not like ‘_sync%’
And propertyId = ‘VC’
And startDate between ‘2018-03-01’ and ‘2018-03-25’
order by propertyId
limit 10 offset 55541
;

Index:
CREATE INDEX IX_Date_cover2_PersonMovements ON reporting_v1(propertyId,startDate,endDate,status,statusChangedDate,portCode,locationId,alerts,firstName,lastName,birthDate,citizenshipCountryCode,genderCode,identifications,reservationNumber,phones,phone,stateroom,teamMemberNumber,photoMediaItemId,securityPhotoMediaItemId,profilePhotoMediaItemId,personTypeCode) WHERE ((type = "PersonMovements") and (not ((meta().id) like "_sync%")))

Explain :

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "covers": [
                "cover ((`reporting_v1`.`propertyId`))",
                "cover ((`reporting_v1`.`startDate`))",
                "cover ((`reporting_v1`.`endDate`))",
                "cover ((`reporting_v1`.`status`))",
                "cover ((`reporting_v1`.`statusChangedDate`))",
                "cover ((`reporting_v1`.`portCode`))",
                "cover ((`reporting_v1`.`locationId`))",
                "cover ((`reporting_v1`.`alerts`))",
                "cover ((`reporting_v1`.`firstName`))",
                "cover ((`reporting_v1`.`lastName`))",
                "cover ((`reporting_v1`.`birthDate`))",
                "cover ((`reporting_v1`.`citizenshipCountryCode`))",
                "cover ((`reporting_v1`.`genderCode`))",
                "cover ((`reporting_v1`.`identifications`))",
                "cover ((`reporting_v1`.`reservationNumber`))",
                "cover ((`reporting_v1`.`phones`))",
                "cover ((`reporting_v1`.`phone`))",
                "cover ((`reporting_v1`.`stateroom`))",
                "cover ((`reporting_v1`.`teamMemberNumber`))",
                "cover ((`reporting_v1`.`photoMediaItemId`))",
                "cover ((`reporting_v1`.`securityPhotoMediaItemId`))",
                "cover ((`reporting_v1`.`profilePhotoMediaItemId`))",
                "cover ((`reporting_v1`.`personTypeCode`))",
                "cover ((meta(`reporting_v1`).`id`))"
              ],
              "filter_covers": {
                "cover ((`reporting_v1`.`type`))": "PersonMovements",
                "cover ((not ((meta(`reporting_v1`).`id`) like \"_sync%\")))": true
              },
              "index": "IX_Date_cover2_PersonMovements",
              "index_id": "5e26c91480916c13",
              "keyspace": "reporting_v1",
              "limit": "(55541 + 10)",
              "namespace": "default",
              "spans": [
                {
                  "Exact": true,
                  "Range": {
                    "High": [
                      "\"VC\"",
                      "successor(\"2018-03-25\")"
                    ],
                    "Inclusion": 1,
                    "Low": [
                      "\"VC\"",
                      "\"2018-03-01\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "((((cover ((`reporting_v1`.`type`)) = \"PersonMovements\") and cover ((not ((meta(`reporting_v1`).`id`) like \"_sync%\")))) and (cover ((`reporting_v1`.`propertyId`)) = \"VC\")) and (cover ((`reporting_v1`.`startDate`)) between \"2018-03-01\" and \"2018-03-25\"))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "cover ((`reporting_v1`.`propertyId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`startDate`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`endDate`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`status`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`statusChangedDate`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`portCode`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`locationId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`alerts`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`firstName`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`lastName`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`birthDate`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`citizenshipCountryCode`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`genderCode`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`identifications`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`reservationNumber`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`phones`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`phone`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`stateroom`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`teamMemberNumber`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`photoMediaItemId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`securityPhotoMediaItemId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`profilePhotoMediaItemId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`personTypeCode`))"
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Offset",
          "expr": "55541"
        },
        {
          "#operator": "Limit",
          "expr": "10"
        },
        {
          "#operator": "FinalProject"
        }
      ]
    },
    "text": "Select propertyId,startDate,endDate,status,statusChangedDate,portCode,locationId,alerts,firstName,lastName,birthDate,\ncitizenshipCountryCode,genderCode,identifications,reservationNumber,phones,phone,stateroom,teamMemberNumber,photoMediaItemId,\nsecurityPhotoMediaItemId,profilePhotoMediaItemId,personTypeCode\nFrom reporting_v1 \nWhere type = 'PersonMovements'\nAnd meta().id not like '_sync%'\nAnd propertyId = 'VC' \nAnd startDate between '2018-03-01' and '2018-03-25' \norder by propertyId\nlimit 10 offset 55541\n;"
  }
]

The version of couchbase that I am using is 4.6.4-4590-enterprise.
It is a single node cluster running on a machine having 4 core CPU and 16 GB RAM.
The bucket is allocated with 1.4 GB RAM. The total number of documents in the bucket are approx. 0.2 Million.

Pre 5.0.0 Offset passing to indexer is not supported i.e when possible it passed as limit as limit+offset and later query applies offset, limit separately.
This has been changed in 5.0.0 Checkout Pagination Pushdown section
https://developer.couchbase.com/documentation/server/current/performance/index_pushdowns.html

Also check Keyset Pagination article by @keshav_m

@vsr1, Its not possible for me to upgrade couchbase server as of now. It would be great if any other optimization is possible in the current version.
Also, is this kind of performance expected? I mean 17K documents doesn’t seem too much data when couchbase is designed to handle millions of records.
I have tried a two node cluster as well but still there is no improvement in the performance.

Your query is covered number of nodes in the cluster will not be any affect.

As offset is huge try this if this helps.

CREATE INDEX ix1 ON reporting_v1(propertyId,startDate ) WHERE ((type = "PersonMovements") and (not ((meta().id) like "_sync%")));

SELECT r.propertyId,r.startDate,r.endDate,r.status,r.statusChangedDate,r.portCode,
       r.locationId,r.alerts,r.firstName,r.lastName,r.birthDate, r.citizenshipCountryCode,
       r.genderCode,r.identifications,r.reservationNumber,r.phones,r.phone,r.stateroom,
       r.teamMemberNumber,r.photoMediaItemId, r.securityPhotoMediaItemId,r.profilePhotoMediaItemId,r.personTypeCode
FROM (SELECT RAW META().id
      FROM reporting_v1
      WHERE type = "PersonMovements" AND  META().id NOT LIKE "_sync%" AND
            startDate BETWEEN "2018-03-01" AND "2018-03-25"
      ORDER BY propertyId
      OFFSET 55541
      LIMIT 10) AS q
JOIN reporting_v1 AS r ON KEYS q ;

Also set pretty=false

@vsr1, this isn’t working either. still taking almost same time. (~ 5 seconds)

The query plan is correct and optimized. Increase indexer quota and try it.

How many documents in total in your bucket? When you say 17k documents is that only of type “PersonMovements”?

I created a new bucket with 35k documents in it, ran the suggested query+index on it and got results in 40ms.

@clinton1ql, There are 240K documents of type “PersonMovements” only in this bucket. No other documents except these.

great, will create a test bucket with similar amount and test.

In the meantime, what happens when you change the startdate between X and Y to startdate = X
i.e. find records for a specific day only. Is that the same speed or faster?

@clinton1ql, I mentioned 17K documents for another query.

Select propertyId,startDate,endDate,status,statusChangedDate,portCode,locationId,alerts,firstName,lastName,birthDate,
citizenshipCountryCode,genderCode,identifications,reservationNumber,phones,phone,stateroom,teamMemberNumber,photoMediaItemId,
securityPhotoMediaItemId,profilePhotoMediaItemId,personTypeCode
From reporting_v1 
Where type = 'PersonMovements'
And meta().id not like '_sync%'
And propertyId = 'VC' 
And startDate  <= '2018-05-25' 
and endDate >= '2018-05-20'
order by propertyId
limit 10 offset 17000
;

This query is taking approx. 12-13 seconds to execute. I thought it might be due to this condition:
And startDate <= ‘2018-05-25’
and endDate >= ‘2018-05-20’

@vsr1, I have tried these queries with both 512 MB Index RAM and 3 GB Index RAM. Still not any improvement.

Explain plan for above query:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "Sequence",
          "~children": [
            {
              "#operator": "IndexScan",
              "covers": [
                "cover ((`reporting_v1`.`propertyId`))",
                "cover ((`reporting_v1`.`startDate`))",
                "cover ((`reporting_v1`.`endDate`))",
                "cover ((`reporting_v1`.`status`))",
                "cover ((`reporting_v1`.`statusChangedDate`))",
                "cover ((`reporting_v1`.`portCode`))",
                "cover ((`reporting_v1`.`locationId`))",
                "cover ((`reporting_v1`.`alerts`))",
                "cover ((`reporting_v1`.`firstName`))",
                "cover ((`reporting_v1`.`lastName`))",
                "cover ((`reporting_v1`.`birthDate`))",
                "cover ((`reporting_v1`.`citizenshipCountryCode`))",
                "cover ((`reporting_v1`.`genderCode`))",
                "cover ((`reporting_v1`.`identifications`))",
                "cover ((`reporting_v1`.`reservationNumber`))",
                "cover ((`reporting_v1`.`phones`))",
                "cover ((`reporting_v1`.`phone`))",
                "cover ((`reporting_v1`.`stateroom`))",
                "cover ((`reporting_v1`.`teamMemberNumber`))",
                "cover ((`reporting_v1`.`photoMediaItemId`))",
                "cover ((`reporting_v1`.`securityPhotoMediaItemId`))",
                "cover ((`reporting_v1`.`profilePhotoMediaItemId`))",
                "cover ((`reporting_v1`.`personTypeCode`))",
                "cover ((meta(`reporting_v1`).`id`))"
              ],
              "filter_covers": {
                "cover ((`reporting_v1`.`type`))": "PersonMovements",
                "cover ((not ((meta(`reporting_v1`).`id`) like \"_sync%\")))": true
              },
              "index": "IX_Date_cover2_PersonMovements",
              "index_id": "5e26c91480916c13",
              "keyspace": "reporting_v1",
              "namespace": "default",
              "spans": [
                {
                  "Range": {
                    "High": [
                      "\"VC\"",
                      "successor(\"2018-05-25\")"
                    ],
                    "Inclusion": 0,
                    "Low": [
                      "\"VC\"",
                      "null",
                      "\"2018-05-20\""
                    ]
                  }
                }
              ],
              "using": "gsi"
            },
            {
              "#operator": "Parallel",
              "~child": {
                "#operator": "Sequence",
                "~children": [
                  {
                    "#operator": "Filter",
                    "condition": "(((((cover ((`reporting_v1`.`type`)) = \"PersonMovements\") and cover ((not ((meta(`reporting_v1`).`id`) like \"_sync%\")))) and (cover ((`reporting_v1`.`propertyId`)) = \"VC\")) and (cover ((`reporting_v1`.`startDate`)) <= \"2018-05-25\")) and (\"2018-05-20\" <= cover ((`reporting_v1`.`endDate`))))"
                  },
                  {
                    "#operator": "InitialProject",
                    "result_terms": [
                      {
                        "expr": "cover ((`reporting_v1`.`propertyId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`startDate`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`endDate`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`status`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`statusChangedDate`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`portCode`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`locationId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`alerts`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`firstName`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`lastName`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`birthDate`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`citizenshipCountryCode`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`genderCode`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`identifications`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`reservationNumber`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`phones`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`phone`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`stateroom`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`teamMemberNumber`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`photoMediaItemId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`securityPhotoMediaItemId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`profilePhotoMediaItemId`))"
                      },
                      {
                        "expr": "cover ((`reporting_v1`.`personTypeCode`))"
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        {
          "#operator": "Offset",
          "expr": "17000"
        },
        {
          "#operator": "Limit",
          "expr": "10"
        },
        {
          "#operator": "FinalProject"
        }
      ]
    },
    "text": "Select propertyId,startDate,endDate,status,statusChangedDate,portCode,locationId,alerts,firstName,lastName,birthDate,\ncitizenshipCountryCode,genderCode,identifications,reservationNumber,phones,phone,stateroom,teamMemberNumber,photoMediaItemId,\nsecurityPhotoMediaItemId,profilePhotoMediaItemId,personTypeCode\nFrom reporting_v1 \nWhere type = 'PersonMovements'\nAnd meta().id not like '_sync%'\nAnd propertyId = 'VC' \nAnd startDate  <= '2018-05-25' \nand endDate >= '2018-05-20'\norder by propertyId\nlimit 10 offset 17000\n;"
  }
]

@clinton1ql,

When the condition is changed to And startDate = “2018-03-05” , then the query executes within 1 second (750-800 ms) for an offset of 18000. (I chose this date deliberately to get maximum number of qualifying records to test for a large offset)

Ah ok. The query you posted now is slightly different as you have endDate in the predicate, where the original examples above only had startDate

And startDate <= ‘2018-05-25’
and endDate >= ‘2018-05-20’

Can you just run the query for a specific startDate and see if it’s more performant?

And startDate = ‘2018-05-25’

@clinton1ql, actually this is the predicate that I need to optimize. I started off by querying on initial fields of the index and got stuck at ‘startDate’ in the very beginning (the first query taking 5-6 seconds).

querying on a specific date is certainly fast. already shared the stats in previous comment.

@krishan.jangid
We also found a big performance difference when doing range vs exact searches. It was also incredibly hard to index as the various columns as the predicates were completely dependant on the report that was done.

So we ended up using views to get the data, with the view getting updated every 5 minutes. Our mutation rate is just too high to wait for view consistency, and we could get away with data being that “old” for that specific case.

Other things we tried were adding “helper” columns to the documents to make the searches more deterministic and reduce the amount of scans as much as possible.

Out of curiosity will your startDate always be <= and your endDate >= or can operators also be < or > etc.
And lastly, how many days are you keeping in your database - a year or two years or is it smaller like only for the last few months? It’s still possible to get N1QL to do this better but a better understanding of the data would help.

@clinton1ql, the >= operator can easily be converted to > by modifying the value to its right side. I am writing this type of queries to show detailed report pages which show records over a flexible date range. Since searching on history data is also an option, the dates can possibly be anything (the window being 30 days).
Also we are planning to store 7 years data in the bucket and then provide reports based on certain complex filters. For Each day there will be around 30-40K documents.
In current scenario, using map reduce is not possible for me as the current filters provide features such as searching people for multiple departments (checkbox is available).
This is the default query that will be executed when the page is loaded (without any filters).

What’s the purpose of this document? Well, this document will store the movement related detail of Guests, visitors and crew (such as time, location of entry/exit etc.) along with their personal details.
The available filters (all of these are optional) in the report are:

  • Age
  • Person type
  • startDate and endDate of Reservation of a person.
  • firstName and lastName
  • Gender
  • department (multiple departments can be selected)
  • Entries/Exit location and time

On the top of these dynamic filters, a dynamic sorting option is also available which contains at least 4 parameters (only one is selectable at a time).

I know what you are thinking right now (Couchbase Analytics!). :wink:
Since it is still in developer preview, and few of the required functionalities are not available in it, we are avoiding it.
Please do let me know if N1QL can handle the requirement.

Thank you, if the other guys haven’t come up with a solution I’ll try play around with some options in a day or two.

What is calling the N1QL by the way? I assume it will be from an API or service? One thing I forgot to mention that we did was instead of using a range queries we broke it up into multiple queries.

Instead of doing startDate between ‘1 March 2017’ and ‘13 March 2017’ we would create 13 queries, starting with where startDate = “1 March 2017” until we had where startDate = “13 March 2017”. Then in our app we just combined the resultsets as needed.

In our one bucket with 500 mill documents it took around 4.5 minutes to find data with a range, but doing it with multiple parallel queries we were able to bring it down to 8 seconds for the 4 million resultset. But it also depends on your use case and if that is possible, you don’t want to be creating hundreds of queries if you’re spanning a lot of dates.

Thanks a lot for your time. Althoug, the workaround you suggested won’t work for the predicate: And startDate <= ‘2018-05-25’
and endDate >= ‘2018-05-20’ I guess.
FYI, The N1QL is being called from an API. One more thing… What tool did you use to create those 500 million documents? Would be very helpful if it saves my time. :slight_smile:

It’s an actor framework based tool that we created for load testing, and we can do as many documents per second as what the Couchbase cluster will allow us to (a few hundred thousand per second depending on the hardware we configured).

Not sure what language you’re using, but in C# you can just write a basic console app using TPL library to create the docs for you.

1 Like

Are you using forestdb or MOI storage. Index has 23 keys and you have only 4 core CPU. That might be contributing.

These are composite range predicates. In 4.x these results in lot of false positives from indexer and query need to eliminate them that is why it taking time. This has been taken care in 5.0.0

Composite Predicate Pushdown explains this in more details.