N1QL queries very slow

Hello,

We use Couchbase Server Version: 4.1.1-5914 Enterprise Edition (build-5914)
Cluster with 4 nodes in production.
We use last version of Couchbase java client to make N1QL queries on this cluster : https://github.com/couchbase/couchbase-java-client

Part of the queries create timeouts on Couchbase side. They are visible in the administration panel, in analytics section (query_requests_5000ms). Only some of them are slow, when you run it multiple times, sometimes it’s very fast, sometimes it timeouts…

We use simple N1QL query, example : SELECT META(default).id AS couchbaseId, * FROM default WHERE type=‘alerte’ AND internauteId='74282378’
We have 18Gb RAM, only 136Mb in use, 225Mb of data… So very limited bunch of data…

I created primary index on the default bucket, and secondary index matching our queries, example : CREATE INDEX idx_alerte_internauteId_2 ON default(internauteId,type) USING GSI

If you can help me to figure out what’s going on…
Thank you so much !

For info, the call to the REST service http://:8091/pools/default gives the following output :

{"storageTotals":{"ram":{"total":33577746432,"quotaTotal":20145242112,"quotaUsed":20145242112,"used":16398053376,"usedByData":141650024,"quotaUsedPerNode":5036310528,"quotaTotalPerNode":5036310528},"hdd":{"total":261505925120,"quotaTotal":261505925120,"used":5230118500,"usedByData":238088753,"free":256275806620}},"serverGroupsUri":"/pools/default/serverGroups?v=114216841","indexMemoryQuota":256,"memoryQuota":4803,"name":"default","alerts":[],"alertsSilenceURL":"/controller/resetAlerts?token=0&uuid=23a412d6355941cf13e52286d4b92fa7","nodes":[{"systemStats":{"cpu_utilization_rate":18.9873417721519,"swap_total":8588881920,"swap_used":0,"mem_total":8394452992,"mem_free":6616002560},"interestingStats":{"cmd_get":0,"couch_docs_actual_disk_size":56784471,"couch_docs_data_size":50419200,"couch_spatial_data_size":0,"couch_spatial_disk_size":0,"couch_views_actual_disk_size":0,"couch_views_data_size":0,"curr_items":4512,"curr_items_tot":9032,"ep_bg_fetched":0,"get_hits":0,"mem_used":35533360,"ops":0,"vb_replica_curr_items":4520},"uptime":"5515407","memoryTotal":8394452992,"memoryFree":6616002560,"mcdMemoryReserved":6404,"mcdMemoryAllocated":6404,"couchApiBase":"http://vcouch01.back.adencf.local:8092/","couchApiBaseHTTPS":"https://vcouch01.back.adencf.local:18092/","otpCookie":"acxcrxolkwpmybtu","clusterMembership":"active","recoveryType":"none","status":"healthy","otpNode":"ns_1@vcouch01.back.adencf.local","thisNode":true,"hostname":"vcouch01.back.adencf.local:8091","clusterCompatibility":262145,"version":"4.1.1-5914-enterprise","os":"x86_64-unknown-linux-gnu","ports":{"sslProxy":11214,"httpsMgmt":18091,"httpsCAPI":18092,"proxy":11211,"direct":11210},"services":["index","kv","n1ql"]},{"systemStats":{"cpu_utilization_rate":20.66326530612245,"swap_total":8588881920,"swap_used":0,"mem_total":8394452992,"mem_free":7568412672},"interestingStats":{"cmd_get":0,"couch_docs_actual_disk_size":59374785,"couch_docs_data_size":52020736,"couch_spatial_data_size":0,"couch_spatial_disk_size":0,"couch_views_actual_disk_size":0,"couch_views_data_size":0,"curr_items":4549,"curr_items_tot":9025,"ep_bg_fetched":0,"get_hits":0,"mem_used":35605840,"ops":0,"vb_replica_curr_items":4476},"uptime":"5515376","memoryTotal":8394452992,"memoryFree":7568412672,"mcdMemoryReserved":6404,"mcdMemoryAllocated":6404,"couchApiBase":"http://vcouch02.back.adencf.local:8092/","couchApiBaseHTTPS":"https://vcouch02.back.adencf.local:18092/","otpCookie":"acxcrxolkwpmybtu","clusterMembership":"active","recoveryType":"none","status":"unhealthy","otpNode":"ns_1@vcouch02.back.adencf.local","hostname":"vcouch02.back.adencf.local:8091","clusterCompatibility":262145,"version":"4.1.1-5914-enterprise","os":"x86_64-unknown-linux-gnu","ports":{"sslProxy":11214,"httpsMgmt":18091,"httpsCAPI":18092,"proxy":11211,"direct":11210},"services":["index","kv","n1ql"]},{"systemStats":{"cpu_utilization_rate":28.02056555269923,"swap_total":8588881920,"swap_used":0,"mem_total":8394387456,"mem_free":6747701248},"interestingStats":{"cmd_get":0,"couch_docs_actual_disk_size":60910170,"couch_docs_data_size":53544448,"couch_spatial_data_size":0,"couch_spatial_disk_size":0,"couch_views_actual_disk_size":0,"couch_views_data_size":0,"curr_items":4539,"curr_items_tot":9084,"ep_bg_fetched":0,"get_hits":0,"mem_used":35480056,"ops":0,"vb_replica_curr_items":4545},"uptime":"5515404","memoryTotal":8394387456,"memoryFree":6747701248,"mcdMemoryReserved":6404,"mcdMemoryAllocated":6404,"couchApiBase":"http://vcouch03.back.adencf.local:8092/","couchApiBaseHTTPS":"https://vcouch03.back.adencf.local:18092/","otpCookie":"acxcrxolkwpmybtu","clusterMembership":"active","recoveryType":"none","status":"healthy","otpNode":"ns_1@vcouch03.back.adencf.local","hostname":"vcouch03.back.adencf.local:8091","clusterCompatibility":262145,"version":"4.1.1-5914-enterprise","os":"x86_64-unknown-linux-gnu","ports":{"sslProxy":11214,"httpsMgmt":18091,"httpsCAPI":18092,"proxy":11211,"direct":11210},"services":["index","kv","n1ql"]},{"systemStats":{"cpu_utilization_rate":18.34625322997416,"swap_total":8588881920,"swap_used":0,"mem_total":8394452992,"mem_free":6313938944},"interestingStats":{"cmd_get":0,"couch_docs_actual_disk_size":61019327,"couch_docs_data_size":53630464,"couch_spatial_data_size":0,"couch_spatial_disk_size":0,"couch_views_actual_disk_size":0,"couch_views_data_size":0,"curr_items":4497,"curr_items_tot":9052,"ep_bg_fetched":0,"get_hits":0,"mem_used":35030768,"ops":0,"vb_replica_curr_items":4555},"uptime":"5515054","memoryTotal":8394452992,"memoryFree":6313938944,"mcdMemoryReserved":6404,"mcdMemoryAllocated":6404,"couchApiBase":"http://vcouch04.back.adencf.local:8092/","couchApiBaseHTTPS":"https://vcouch04.back.adencf.local:18092/","otpCookie":"acxcrxolkwpmybtu","clusterMembership":"active","recoveryType":"none","status":"healthy","otpNode":"ns_1@vcouch04.back.adencf.local","hostname":"vcouch04.back.adencf.local:8091","clusterCompatibility":262145,"version":"4.1.1-5914-enterprise","os":"x86_64-unknown-linux-gnu","ports":{"sslProxy":11214,"httpsMgmt":18091,"httpsCAPI":18092,"proxy":11211,"direct":11210},"services":["index","kv","n1ql"]}],"buckets":{"uri":"/pools/default/buckets?v=23761273&uuid=23a412d6355941cf13e52286d4b92fa7","terseBucketsBase":"/pools/default/b/","terseStreamingBucketsBase":"/pools/default/bs/"},"remoteClusters":{"uri":"/pools/default/remoteClusters?uuid=23a412d6355941cf13e52286d4b92fa7","validateURI":"/pools/default/remoteClusters?just_validate=1"},"controllers":{"addNode":{"uri":"/controller/addNodeV2?uuid=23a412d6355941cf13e52286d4b92fa7"},"rebalance":{"uri":"/controller/rebalance?uuid=23a412d6355941cf13e52286d4b92fa7"},"failOver":{"uri":"/controller/failOver?uuid=23a412d6355941cf13e52286d4b92fa7"},"startGracefulFailover":{"uri":"/controller/startGracefulFailover?uuid=23a412d6355941cf13e52286d4b92fa7"},"reAddNode":{"uri":"/controller/reAddNode?uuid=23a412d6355941cf13e52286d4b92fa7"},"reFailOver":{"uri":"/controller/reFailOver?uuid=23a412d6355941cf13e52286d4b92fa7"},"ejectNode":{"uri":"/controller/ejectNode?uuid=23a412d6355941cf13e52286d4b92fa7"},"setRecoveryType":{"uri":"/controller/setRecoveryType?uuid=23a412d6355941cf13e52286d4b92fa7"},"setAutoCompaction":{"uri":"/controller/setAutoCompaction?uuid=23a412d6355941cf13e52286d4b92fa7","validateURI":"/controller/setAutoCompaction?just_validate=1"},"clusterLogsCollection":{"startURI":"/controller/startLogsCollection?uuid=23a412d6355941cf13e52286d4b92fa7","cancelURI":"/controller/cancelLogsCollection?uuid=23a412d6355941cf13e52286d4b92fa7"},"replication":{"createURI":"/controller/createReplication?uuid=23a412d6355941cf13e52286d4b92fa7","validateURI":"/controller/createReplication?just_validate=1"}},"rebalanceStatus":"none","rebalanceProgressUri":"/pools/default/rebalanceProgress","stopRebalanceUri":"/controller/stopRebalance?uuid=23a412d6355941cf13e52286d4b92fa7","nodeStatusesUri":"/nodeStatuses","maxBucketCount":10,"autoCompactionSettings":{"parallelDBAndViewCompaction":false,"databaseFragmentationThreshold":{"percentage":30,"size":"undefined"},"viewFragmentationThreshold":{"percentage":30,"size":"undefined"},"indexFragmentationThreshold":{"percentage":30}},"tasks":{"uri":"/pools/default/tasks?v=61309602"},"counters":{"rebalance_success":1,"rebalance_start":1},"indexStatusURI":"/indexStatus?v=72081389"}

Could you try to EXPLAIN one of the slow queries, and post the output of that EXPLAIN command here?

Hi, thank you for your reply. You’ll find the explain command result below :

[
{
"#operator": “Sequence”,
"~children": [
{
"#operator": “IndexScan”,
“index”: “idx_alerte_internauteId_1”,
“keyspace”: “default”,
“namespace”: “default”,
“spans”: [
{
“Range”: {
“High”: [
"“74266020"”,
"“alerte”"
],
“Inclusion”: 3,
“Low”: [
"“74266020"”,
"“alerte”"
]
}
}
],
“using”: “gsi”
},
{
"#operator": “Parallel”,
"~child": {
"#operator": “Sequence”,
"~children": [
{
"#operator": “Fetch”,
“keyspace”: “default”,
“namespace”: “default”
},
{
"#operator": “Filter”,
“condition”: “(((default.type) = “alerte”) and ((default.internauteId) = “74266020”))”
},
{
"#operator": “InitialProject”,
“result_terms”: [
{
“as”: “couchbaseId”,
“expr”: “(meta(default).id)”
},
{
“expr”: “self”,
“star”: true
}
]
},
{
"#operator": “FinalProject”
}
]
}
}
]
}
]

Thx

The index definition is correct. You can get slightly improvement by doing this:
CREATE INDEX idx_alerte_internauteId_2 ON default(internauteId) where type = ‘alerte’ USING GSI; This will only handle the type: alerte

How many document do you expect in the resultset for this query and what’s the average size of the document?
Index scan has a default limit ot 2 minutes. If you have very large number of documents with the same internauteId, the index scan could time out…

Hi,

Thank you for your optimization for the index query I will use it.
We have only 22.921 documents by now, even queries returning no document are timing out…

Documents are quite small and there are not so many docs per internauteId…
An example of JSON document stored in CB :

{
“dateCreation”: “2016-08-02T09:55:15.274Z”,
“provenance”: “test”,
“titre”: “mon alerte”,
“dateModification”: “2016-08-02T09:55:15.274Z”,
“externalId”: “alerte::100”,
“internauteId”: “123456789”,
“criteres”: {
“salaireMin”: 33000,
“localisations”: [
{
“departements”: [],
“region”: {
“code”: 16,
“label”: “Midi-Pyrénées”
}
}
],
“salaireMax”: 40000,
“motsCles”: “”,
“contrats”: [
{
“code”: 2,
“label”: “CDD / Intérim / VIE”
},
{
“code”: 3,
“label”: “Intérim”
},
{
“code”: 4,
“label”: “Contractuel”
},
{
“code”: 13,
“label”: “Alternance”
},
{
“code”: 1,
“label”: “CDI”
}
],
“fonctions”: [
{
“code”: 30500,
“label”: “Marketing”
},
{
“code”: 30103,
“label”: " Administration des ventes"
}
],
“secteurs”: [
{
“code”: 60000,
“label”: “Industries”
}
]
},
“type”: “alerte”,
“email”: "toto@toto.fr"
}

Thank you for your help !

If you run the same queries without the Java part (with cbq for example), does it behave the same?

As a side-note: Couchbase 4.5 brought a lot of enhancements to (among other things) N1QL. Any chance you could try this against a 4.5 cluster?