Encoded array key too long (> 30978). Skipped

@vsr1, I have changed the index definition to:

CREATE INDEXIX_teamMember_Cover_reportingONreporting_test((all (arrayboardingStatusforboardingStatusinteamMemberBoardingStatusesend)),propertyId,date_format_str(debarkDate, "1111-11-11"),date_format_str(embarkDate, "1111-11-11"),firstName,middleName,lastName,birthDate,citizenshipCountryCode,genderCode,phones,photoMediaItemId,identifications,stateroom,teamMemberNumber,departmentCode) WHERE ((type= "TeamMember") and (not ((meta().id) like "_sync%")))

The unnest operator is using index scan now.

I have changed the order of index keys, structure.
embarkDate is <= so it has both low and high bound and moved to leading
debarkDate is >= no high bound so moved to trailing. pre 5.0.0 If high bound is missing next key high bound will not passed to indexer.
You really doesn’t need array index. you need just array has single element so used FIRST expression. This will not hit keysize
moved array to end.

Change query projection.
Unnest repeats parent document instead just get boarding info as array do the looping in application.
As you are getting array the limit value can be much lesser.

CREATE INDEX IX_teamMember_Cover_reporting ON default (propertyId,
  (FIRST 1 FOR s IN teamMemberBoardingStatuses WHEN s.statusChangedDate IS NOT MISSING END),
DATE_FORMAT_STR(embarkDate, "1111-11-11"),
DATE_FORMAT_STR(debarkDate, "1111-11-11"),
firstName, middleName, lastName, birthDate, citizenshipCountryCode, genderCode, phones, photoMediaItemId,
identifications, stateroom, teamMemberNumber, departmentCode, teamMemberBoardingStatuses)
WHERE ((type = "TeamMember") and (not ((meta().id) like "_sync%")));


SELECT meta(tm).id, tm.firstName, tm.middleName, tm.lastName, tm.birthDate, tm.citizenshipCountryCode, tm.genderCode,
tm.phones, tm.photoMediaItemId, tm.identifications, tm.stateroom, tm.teamMemberNumber, tm.departmentCode,
ARRAY b FOR b IN tm.teamMemberBoardingStatuses WHEN b.statusChangedDate IS NOT MISSING END AS boardingStatus
FROM default tm
WHERE tm.type = "TeamMember" AND meta(tm).id NOT LIKE "_sync%"
AND tm.propertyId = "AL"
    AND (DATE_FORMAT_STR(tm.debarkDate,"1111-11-11") >= "2018-03-19")
    AND (DATE_FORMAT_STR(tm.embarkDate,"1111-11-11") <= "2018-03-22")
    AND (FIRST 1 FOR s IN tm.teamMemberBoardingStatuses WHEN s.statusChangedDate IS NOT MISSING END)  = 1
    Order by meta(tm).id
    Limit 10 Offset 0
    ;

If you still prefer UNNEST you can add it with same index and query

@vsr1, since the number of elements inside the array are not fixed, entire data would be required by the application to provide proper pagination which seems an overhead. I will have to go for the unnest operator. Will try above index and let you know the result.

In the index definition that you provided, the entire collection is also added as the last key. Will it not cause the same issue (array exceeding keysize) ?

That index is not array index. It is normal index with array as scalar. This less likely cause array key size issue. If array itself exceeds the secondary index key limit then it can cause. cc @deepkaran.salooja

@vsr1, is the size you are talking about max_array_seckey_size ?
This is a bit confusing as the problem that I am facing is occurring even when I don’t use any array index and mention the entire array in the covering index (as scalar) too.
Also as I asked earlier, can Encoded_size() function be used for finding the size of document that I can use to set the maximum value for it and be sure if it is working or not?

CREATE INDEX IX_teamMember_Cover_reporting ON reporting(propertyId,date_format_str(debarkDate, “1111-11-11”),date_format_str(embarkDate, “1111-11-11”),teamMemberBoardingStatuses,firstName,middleName,lastName,birthDate,citizenshipCountryCode,genderCode,phones,photoMediaItemId,identifications,stateroom,teamMemberNumber,departmentCode) WHERE ((type = “TeamMember”) and (not ((meta().id) like “_sync%”)))

roughly gives the size. You may need more because index over head.
Why don’t you use non covering index

Already tried. takes more than 1.5 seconds. An intersect scan is done with primary and type index (including the non-covering index)

So as I said in the previous comment, the size returned was approx. 21k.
the value I used for that setting was 90k. Still it didn’t work. What size the overhead can be?

@deepkaran.salooja could you answer this.

Try this(change bucket names). Make sure inner subquery is covered by ix1 doing EXPLAIN on the subquery

   CREATE ix1  ON default (propertyId,
        (FIRST 1 FOR s IN teamMemberBoardingStatuses WHEN s.statusChangedDate IS NOT MISSING END),
        DATE_FORMAT_STR(embarkDate, "1111-11-11"),
        DATE_FORMAT_STR(debarkDate, "1111-11-11"))
        WHERE ((type = "TeamMember") and (not ((meta().id) like "_sync%")));


    SELECT META(tm).id, tm.firstName, tm.middleName, tm.lastName,
           tm.birthDate, tm.citizenshipCountryCode, tm.genderCode,
           tm.phones, tm.photoMediaItemId, tm.identifications, tm.stateroom,
           tm.teamMemberNumber, tm.departmentCode, boardingStatus.status as boardingStatus,
           boardingStatus.statusChangedDate as boardingStatusChangedDate,
           boardingStatus.locationId, boardingStatus.locationCode
    FROM (SELECT RAW META(tm).id
          FROM default tm
          WHERE tm.type = "TeamMember" AND meta(tm).id NOT LIKE "_sync%"
                         AND tm.propertyId = "AL"
                         AND (DATE_FORMAT_STR(tm.debarkDate,"1111-11-11") >= "2018-03-19")
                         AND (DATE_FORMAT_STR(tm.embarkDate,"1111-11-11") <= "2018-03-22")
                         AND (FIRST 1 FOR s IN tm.teamMemberBoardingStatuses WHEN s.statusChangedDate IS NOT MISSING END) = 1
                         ORDER BY META(tm).id OFFSET 0 LIMIT 10 ) AS q1
    JOIN default AS tm ON KEYS q1
    UNNEST tm.teamMemberBoardingStatuses AS boardingStatus
    ORDER BY META(tm).id
    OFFSET 0
    LIMIT 10

If you need to pagination remove outer OFFSET,LIMIT so that you get 10 document results your results each time vary depends on how many elements in array qualifies.

I did n’t realize you have tm.teamMemberBoardingStatuses IS NOT MISSING, Then you can move this Index where clause and remove FIRST… key from index and query

The page on the UI can’t change dynamically based on how many documents each person has after unnest.
It shows page numbers based on total records returned by overall query. Now in case the total docs are 5640, and the subquery is returning 220 docs, then how do you suggest to jump directly to 5630 page?

You can remove Inner ORDER,OFFSET,LIMIT.
Note: If you use OFFSET,LIMIT your results may miss data during pagination because you are doing order non unique key due to unnest ( Example: “doc1” unnest 5 documents , “doc2” unnest 20, OFFSET 10, LIMIT 10 may give first 5 records that already in OFFSET 0 LIMIIT 10)

Basically ORDER BY duplicates query can give any value in the duplicates when qualified duplicate records more than limit

I guess we need to find some other solution. :smile:

@krishan.jangid, the formula to calculate max_array_seckey_size is as follows:

If the index definition is - create index idx on bucket( a, distinct (array (x.y) for x in b end) , c )
then max_array_seckey_size = ( size ( a ) + size ( y ) + size ( c ) ) * (num of elements in b array)

@deepkaran.salooja, why do we have to multiply the num of elements in b array with the total size of all fields in the index definition and not just the collection field size?

@deepkaran.salooja, I am not able to reproduce this scenario on another environment (4.6.4-4590-enterprise). I added a document having collection size = 32K while the default value of max_array_seckey_size is set to 10240. still the document is accessible through index scan. How is this happening?

@krishan.jangid, if the array has [y1, y2, y3], sec key would be [[a, y1, c],[a,y2,c],[a,y3,c]]. So it needs to be multiplied with the size of all fields.

In 4.6.4 we had to provide some extra padding for the buffer to fix a bug. I think your key size is just enough to fit into that extra space.