Encoded array key too long (> 30978). Skipped

frasdav · August 17, 2016, 11:30am

I’m seeing an error in indexer.log that says the array that I’m indexing is too long, but there’s only 31 items (with a total char count of 1078).

This is the array:

"members": [
  "IDaaS_ESTAB-392DEMO_PL-8936",
  "IDaaS_ESTAB-392DEMO_PL-8998",
  "IDaaS_ESTAB-392DEMO_PL-9007",
  "IDaaS_ESTAB-392DEMO_PL-9046",
  "IDaaS_ESTAB-392DEMO_PL-9103",
  "IDaaS_ESTAB-392DEMO_PL-9109",
  "IDaaS_ESTAB-392DEMO_PL-9124",
  "IDaaS_ESTAB-392DEMO_PL-9133",
  "IDaaS_ESTAB-392DEMO_PL-9151",
  "IDaaS_ESTAB-392DEMO_PL-9169",
  "IDaaS_ESTAB-392DEMO_PL-9193",
  "IDaaS_ESTAB-392DEMO_PL-9202",
  "IDaaS_ESTAB-392DEMO_PL-9205",
  "IDaaS_ESTAB-392DEMO_PL-9208",
  "IDaaS_ESTAB-392DEMO_PL-9214",
  "IDaaS_ESTAB-392DEMO_PL-9217",
  "IDaaS_ESTAB-392DEMO_PL-9228",
  "IDaaS_ESTAB-392DEMO_PL-9231",
  "IDaaS_ESTAB-392DEMO_PL-9249",
  "IDaaS_ESTAB-392DEMO_PL-9257",
  "IDaaS_ESTAB-392DEMO_PL-9276",
  "IDaaS_ESTAB-392DEMO_PL-9279",
  "IDaaS_ESTAB-392DEMO_PL-9281",
  "IDaaS_ESTAB-392DEMO_PL-9284",
  "IDaaS_ESTAB-392DEMO_PL-9287",
  "IDaaS_ESTAB-392DEMO_PL-9290",
  "IDaaS_ESTAB-392DEMO_PL-9291",
  "IDaaS_ESTAB-392DEMO_PL-9301",
  "IDaaS_ESTAB-392DEMO_PL-9304",
  "IDaaS_ESTAB-392DEMO_PL-9310",
  "IDaaS_ESTAB-392DEMO_PWT-2"
],

…and in the logs I’m seeing:

2016-08-17T12:19:02.731+01:00 [Error] ForestDBSlice::insert Error indexing docid: st4C_E31nkCC8LCRFybpZw_IDaaS_ESTAB-392DEMO_GC-8889_g in Slice: 0. Error: Encoded array key too long (> 30978). Skipped.

Anything I can do about this other than to reduce the size of the array (or presumably the length of each item in the array)?

4.5.1-2806 Enterprise Edition (build-2806) on Windows Server 2012.

prasad · August 17, 2016, 5:45pm

Hi frasdav,
Can you try increasing this setting:
curl -X POST http://localhost:9102/settings -u Administrator:password -d '{“indexer.settings.max_array_seckey_size”: 51200}’

Btw, is it a composite index? can u provide the create index statement used (and sizes of other keys).

regards.

frasdav · August 18, 2016, 9:40am

Hi @prasad,

Thanks - that’s allowed the example document I posted to be indexed, but we’ve got larger ones that that (with 90+ items in the array) that are still being skipped.

2016-08-18T10:27:04.478+01:00 [Error] ForestDBSlice::insert Error indexing docid: jvZT32_JbUaFEULmpaA7nQ_IDaaS_ESTAB-3924027_GY-31_g in Slice: 0. Error: Encoded array key too long (> 153858). Skipped.

Yep, compound index, created with:

CREATE INDEX `Idx_Core_Group_n`
ON `core_north-europe`(`docType`,`groupProviderId`,`members`,(distinct (array `m` for `m` in `members` end)),`name`)
WHERE (`docType` = "Group") USING GSI WITH {"nodes":["IP_ADDRESS:8091"]};

docType always 5 chars, groupProviderId always 22 chars, members is variable and name is max 250 chars. Here’s the full doc from the above skipped error:

{
  "description": "Year Groups > Year  7",
  "groupProviderId": "jvZT32_JbUaFEULmpaA7nQ",
  "isMissingFromSource": false,
  "members": [
    "IDaaS_ESTAB-3924027_PL-6600",
    "IDaaS_ESTAB-3924027_PL-6601",
    "IDaaS_ESTAB-3924027_PL-6602",
    "IDaaS_ESTAB-3924027_PL-6604",
    "IDaaS_ESTAB-3924027_PL-6605",
    "IDaaS_ESTAB-3924027_PL-6606",
    "IDaaS_ESTAB-3924027_PL-6609",
    "IDaaS_ESTAB-3924027_PL-6610",
    "IDaaS_ESTAB-3924027_PL-6612",
    "IDaaS_ESTAB-3924027_PL-6617",
    "IDaaS_ESTAB-3924027_PL-6618",
    "IDaaS_ESTAB-3924027_PL-6619",
    "IDaaS_ESTAB-3924027_PL-6622",
    "IDaaS_ESTAB-3924027_PL-6625",
    "IDaaS_ESTAB-3924027_PL-6627",
    "IDaaS_ESTAB-3924027_PL-6629",
    "IDaaS_ESTAB-3924027_PL-6631",
    "IDaaS_ESTAB-3924027_PL-6634",
    "IDaaS_ESTAB-3924027_PL-6641",
    "IDaaS_ESTAB-3924027_PL-6643",
    "IDaaS_ESTAB-3924027_PL-6645",
    "IDaaS_ESTAB-3924027_PL-6647",
    "IDaaS_ESTAB-3924027_PL-6651",
    "IDaaS_ESTAB-3924027_PL-6653",
    "IDaaS_ESTAB-3924027_PL-6657",
    "IDaaS_ESTAB-3924027_PL-6660",
    "IDaaS_ESTAB-3924027_PL-6663",
    "IDaaS_ESTAB-3924027_PL-6665",
    "IDaaS_ESTAB-3924027_PL-6667",
    "IDaaS_ESTAB-3924027_PL-6666",
    "IDaaS_ESTAB-3924027_PL-6664",
    "IDaaS_ESTAB-3924027_PL-6662",
    "IDaaS_ESTAB-3924027_PL-6658",
    "IDaaS_ESTAB-3924027_PL-6655",
    "IDaaS_ESTAB-3924027_PL-6652",
    "IDaaS_ESTAB-3924027_PL-6649",
    "IDaaS_ESTAB-3924027_PL-6646",
    "IDaaS_ESTAB-3924027_PL-6644",
    "IDaaS_ESTAB-3924027_PL-6642",
    "IDaaS_ESTAB-3924027_PL-6636",
    "IDaaS_ESTAB-3924027_PL-6632",
    "IDaaS_ESTAB-3924027_PL-6630",
    "IDaaS_ESTAB-3924027_PL-6628",
    "IDaaS_ESTAB-3924027_PL-6626",
    "IDaaS_ESTAB-3924027_PL-7755",
    "IDaaS_ESTAB-3924027_PL-7834",
    "IDaaS_ESTAB-3924027_PL-7869",
    "IDaaS_ESTAB-3924027_PL-7895",
    "IDaaS_ESTAB-3924027_PL-7838",
    "IDaaS_ESTAB-3924027_PL-7813",
    "IDaaS_ESTAB-3924027_PL-6624",
    "IDaaS_ESTAB-3924027_PL-6670",
    "IDaaS_ESTAB-3924027_PL-6674",
    "IDaaS_ESTAB-3924027_PL-6677",
    "IDaaS_ESTAB-3924027_PL-6679",
    "IDaaS_ESTAB-3924027_PL-6756",
    "IDaaS_ESTAB-3924027_PL-6758",
    "IDaaS_ESTAB-3924027_PL-6761",
    "IDaaS_ESTAB-3924027_PL-6920",
    "IDaaS_ESTAB-3924027_PL-6958",
    "IDaaS_ESTAB-3924027_PL-7003",
    "IDaaS_ESTAB-3924027_PL-7011",
    "IDaaS_ESTAB-3924027_PL-7049",
    "IDaaS_ESTAB-3924027_PL-7224",
    "IDaaS_ESTAB-3924027_PL-7247",
    "IDaaS_ESTAB-3924027_PL-7458",
    "IDaaS_ESTAB-3924027_PL-7460",
    "IDaaS_ESTAB-3924027_PL-7459",
    "IDaaS_ESTAB-3924027_PL-7277",
    "IDaaS_ESTAB-3924027_PL-7227",
    "IDaaS_ESTAB-3924027_PL-7220",
    "IDaaS_ESTAB-3924027_PL-7038",
    "IDaaS_ESTAB-3924027_PL-7007",
    "IDaaS_ESTAB-3924027_PL-6964",
    "IDaaS_ESTAB-3924027_PL-6927",
    "IDaaS_ESTAB-3924027_PL-6762",
    "IDaaS_ESTAB-3924027_PL-6760",
    "IDaaS_ESTAB-3924027_PL-6757",
    "IDaaS_ESTAB-3924027_PL-6680",
    "IDaaS_ESTAB-3924027_PL-6678",
    "IDaaS_ESTAB-3924027_PL-6676",
    "IDaaS_ESTAB-3924027_PL-6673",
    "IDaaS_ESTAB-3924027_PL-6668",
    "IDaaS_ESTAB-3924027_PL-6620",
    "IDaaS_ESTAB-3924027_PL-6615",
    "IDaaS_ESTAB-3924027_PL-6608",
    "IDaaS_ESTAB-3924027_PL-6603",
    "IDaaS_ESTAB-3924027_PL-6598",
    "IDaaS_ESTAB-3924027_PL-6586",
    "IDaaS_ESTAB-3924027_PL-6594",
    "IDaaS_ESTAB-3924027_PL-6596",
    "IDaaS_ESTAB-3924027_PL-6585"
  ],
  "name": "Year  7",
  "parentGroupExternalId": null,
  "created": "2016-02-19T16:01:09.7533675Z",
  "docType": "Group",
  "modified": null
}

Thanks,
Fraser

prasad · August 18, 2016, 11:19pm

Hi @frasdav,

Provide big enough value for the max_array_seckey_size setting. The value should be sum of all index-keys * size of array

In this case, 5 + 22 + sizeof(members array) + sizeof(array element) + 250

roughly, the array has 92 elements, and 28 bytes for each element ==> total-array-size = 2576 bytes. So, roughly you may need to set the value to 265K.

This should unblock you. I will get back and confirm the exact math.
-Prasad

frasdav · August 24, 2016, 3:26pm

Thanks @prasad.

Is there a practical limit to what I should set for max_array_seckey_size?

prasad · August 24, 2016, 5:25pm

Indirectly, the max document size (which is 20MB) imposes a limit.
Btw, if your document size is growing too big, then its an indication to rework the data model. What’s the max doc size you plan to use?

frasdav · August 25, 2016, 9:43am

Largest array is going to be 2000 entries max. I’ve done some work to shorten the length of array entries, so rather than:

"members": [
  "IDaaS_ESTAB-3924027_PL-6600",
  "IDaaS_ESTAB-3924027_PL-6601",
  "IDaaS_ESTAB-3924027_PL-6602",
  ...
]

…I’ve got:

"members": [
  "PL-6600",
  "PL-6601",
  "PL-6602",
  ...
]

At 2000 entries in the new format, my maths then is:

(5 + 22 + (2000 * 7) + 7 + 250) * 2000 = 28.5M

…so I’m bust?

prasad · August 26, 2016, 4:39am

Hi Fraser,

At 2000 entries in the new format, my maths then is:
(5 + 22 + (2000 * 7) + 7 + 250) * 2000 = 28.5M

this math is good for max_array_seckey_size, and there is no hard limitation for this. The 20MB limitation I mentioned earlier is on the total document size.

However, with covering index on large arrays, you will hit another limitation of 4K on index-key size. So, for now, you may have to use non-covering indexes, or rework the data model. We will look into addressing this issue.

Btw, can you clarify what is the max array size you may have, and the use-case. Are the elements of this members references to other documents (for JOINs etc)??

thanks,
-Prasad

frasdav · August 26, 2016, 10:25am

Thanks Prasad.

These are group documents, and at login we perform a query to determine membership of groups based on an alias from an external provider (e.g. PL-6600) being present in the members array. So a JOIN yes, but not on document Ids.

I’ve realised though that I can drop indexing the members array with no performance impact from what I can see (and in fact in some tests a performance improvement), because my query also filters on the groupProviderId property.

SELECT META().id
FROM `core_north-europe`
WHERE docType="Group" AND groupProviderId IN ["jvZT32_JbUaFEULmpaA7nQ","LaZ6QC2S1EmohJeADuTQdQ"] AND ANY m IN members SATISFIES m IN ["PL-6600","$teachingstaff@Rff0jDX4c0KfPa-YbfzV-w"] END;

In this scenario, I assume what’s happening is that the query engine is satisfying the first condition (groupProviderId=x) from the cover, and then loading those documents to scan the arrays in order to satisfy the second condition?

I was half expecting the EXPLAIN output to show that, maybe as a filter_cover, but this is what I get:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexScan",
          "index": "Idx_Core_Group_Test2",
          "index_id": "b874997b90d7d63b",
          "keyspace": "core_north-europe",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"Group\"",
                  "successor(\"LaZ6QC2S1EmohJeADuTQdQ\")"
                ],
                "Inclusion": 1,
                "Low": [
                  "\"Group\"",
                  "\"LaZ6QC2S1EmohJeADuTQdQ\""
                ]
              }
            },
            {
              "Range": {
                "High": [
                  "\"Group\"",
                  "successor(\"jvZT32_JbUaFEULmpaA7nQ\")"
                ],
                "Inclusion": 1,
                "Low": [
                  "\"Group\"",
                  "\"jvZT32_JbUaFEULmpaA7nQ\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "Fetch",
          "keyspace": "core_north-europe",
          "namespace": "default"
        },
        {
          "#operator": "Parallel",
          "~child": {
            "#operator": "Sequence",
            "~children": [
              {
                "#operator": "Filter",
                "condition": "((((`core_north-europe`.`docType`) = \"Group\") and ((`core_north-europe`.`groupProviderId`) in [\"jvZT32_JbUaFEULmpaA7nQ\", \"LaZ6QC2S1EmohJeADuTQdQ\"])) and any `m` in (`core_north-europe`.`members`) satisfies (`m` in [\"PL-6600\", \"$teachingstaff@Rff0jDX4c0KfPa-YbfzV-w\"]) end)"
              },
              {
                "#operator": "InitialProject",
                "result_terms": [
                  {
                    "expr": "(meta(`core_north-europe`).`id`)"
                  }
                ]
              },
              {
                "#operator": "FinalProject"
              }
            ]
          }
        }
      ]
    },
    "text": "SELECT META().id FROM `core_north-europe` USE INDEX(`Idx_Core_Group_Test2` USING GSI) WHERE docType=\"Group\" AND groupProviderId IN [\"jvZT32_JbUaFEULmpaA7nQ\",\"LaZ6QC2S1EmohJeADuTQdQ\"] AND ANY m IN members SATISFIES m IN [\"PL-6600\",\"$teachingstaff@Rff0jDX4c0KfPa-YbfzV-w\"] END;"
  }
]

This is just for my curiosity now, excluding the array from the cover like this solves my issue with no apparent detrimental effect.

Thanks,
Fraser

geraldss · December 14, 2016, 4:34pm

Fix checked in. See https://issues.couchbase.com/browse/MB-21770

krishan.jangid · March 21, 2018, 9:48pm

Hi @geraldss, facing similar issue on Couchbase 4.6.4-4590-enterprise.
I checked the maximum size of the document available in that type with the help of ‘encoded_size’ function which returned 21007. I changed the settting to 90240 and restarted all services.
Still facing this issue.
In case the way I calculated the total size is incorrect, then please let me know.

deepkaran.salooja · March 22, 2018, 5:24am

@krishan.jangid, what is your index definition?

krishan.jangid · March 22, 2018, 7:06am

CREATE INDEX IX_teamMember_Cover_reporting ON reporting(propertyId,date_format_str(debarkDate, “1111-11-11”),date_format_str(embarkDate, “1111-11-11”),(distinct (array s.statusChangedDate for s in teamMemberBoardingStatuses end)),teamMemberBoardingStatuses,firstName,middleName,lastName,birthDate,citizenshipCountryCode,genderCode,phones,photoMediaItemId,identifications,stateroom,teamMemberNumber,departmentCode) WHERE ((type = “TeamMember”) and (not ((meta().id) like “_sync%”)))

clinton1ql · March 22, 2018, 9:53am

Looking at your index it looks like teamMemberBoardingStatuses is your array object, and you are including that in your index statement after your array index of the statusChangedDate field.

Just curious if you need to include the array in the index, since it’s not known what you are selecting.

With earlier versions of Couchbase you had to include the array for the covering index to work, but in >= 4.6.2 you don’t need to do that - only do the index on the fields you need. That should cut down the size of your index quite a lot if you can drop teamMemberBoardingStatuses full object from the definition.

krishan.jangid · March 22, 2018, 9:56am

@clinton1ql, I am using unnest operator on that collection and the query can filter on the basis of all it’s fields. That’s why I am including entire array object as it boosted up the performance considerably.
In select statement, all the fields that are available in index are being used (including all of the inner fields of teamMemberBoardingStatuses collection).
The where clause is using initial three leading keys for basic filtering.

krishan.jangid · March 22, 2018, 10:20am

Collection:

“teamMemberBoardingStatuses”: [
{
“lastBoardingStatusUpdateTime”: “2018-03-21T22:46:34.034”,
“status”: “OUT”,
“statusChangedDate”: “2018-03-21T22:00:44”
“locationId”:“3debfe42-2f5d-4adf-bf70-14d6cb1f8636”,
“locationCode”: “GA”
},
{
“lastBoardingStatusUpdateTime”: “2018-03-21T22:46:34.034”,
“status”: “IN”,
“statusChangedDate”: “2018-03-21T22:00:44”,
“locationId”:“3debfe42-2f5d-4adf-bf70-14d6cb1f8636”,
“locationCode”: “GA”
}…
]

clinton1ql · March 22, 2018, 10:59am

Thanks for posting the example.

Do you have a limit on how many entries can be in that collection? If it can grow unbounded then you will have a problem setting the max key size beforehand, else just calculate the max amount that the index key could be. Would also suggest looking at the document model, maybe instead of putting it into a collection have them as separate documents.

Any possibility of moving to Couchbase 5.x if you want to keep it in the collection? The index size limitations were removed there if I remember correctly.

krishan.jangid · March 22, 2018, 11:32am

My assumption is that the number can go as high as 1000 for one person.
currently, in the data, the maximum number is 20. The change you are suggesting requires a lot of effort for us now. so I would prefer resolving it with current document structure.
Unfortunately, we are not planning to move to Couchbase 5.x.

PS. Please do let me know what I did wrong in the first comment I posted.

vsr1 · March 22, 2018, 12:11pm

Please post the query. Unnest will only take advantage of a array index when it is leading for indexscans.
If you including array for covering best position is at the end of the index keys.

krishan.jangid · March 22, 2018, 12:34pm

SELECT meta(tm).id, tm.firstName, tm.middleName, tm.lastName, tm.birthDate, tm.citizenshipCountryCode, tm.genderCode, 
tm.phones, tm.photoMediaItemId, tm.identifications, tm.stateroom, tm.teamMemberNumber, tm.departmentCode,
boardingStatus.status as boardingStatus, boardingStatus.statusChangedDate as boardingStatusChangedDate, 
boardingStatus.locationId, boardingStatus.locationCode 
FROM reporting tm UNNEST tm.teamMemberBoardingStatuses AS boardingStatus 
WHERE tm.type = "TeamMember" AND meta(tm).id NOT LIKE "_sync%" 
AND tm.propertyId = "AL" 
    AND (DATE_FORMAT_STR(tm.debarkDate,"1111-11-11") >= "2018-03-19") 
    AND (DATE_FORMAT_STR(tm.embarkDate,"1111-11-11") <= "2018-03-22")
    AND tm.teamMemberBoardingStatuses IS NOT MISSING
    Order by meta(tm).id 
    Limit 10 Offset 0
    ;