Couhbase Search Query Filtering

kaintharinder · January 8, 2022, 7:17am

I have a funcationality where I need to filter results from a document based on full text search.
Below is the document JSON which includes the fields that are in use for FTS.

JSON DOC
{
“clientId”:“6b643e3ff209”,
“groupId”:“DP-1”,
“id”:“dc26445b-e1d0-42be-b9a3-d0e18aa63ddf”,
“items”:[
“a1”,
“a2”
],
“priority”:999,
“tenantId”:“ecommerce”,
“type”:“group”,
“useCase”:[
{
“disabled”:true,
“externalIds”:[
“wb-98bded7a-95a0-4120-8f1d-007db7c30728”
],
“label”:“Use Case 1”,
“name”:“useCase1”
},
{
“disabled”:true,
“externalIds”:[
“wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd”
],
“label”:“Use Case 2”,
“name”:“useCase2”
}
]
}

FTS Index Configuration

Query
WITH results AS (
SELECT SEARCH_META(filteredUseCase) AS entry
FROM bucket-a AS doc
WHERE SEARCH(doc, { “size” : 1,
“fields” : ["*"],
“sort”: [ { “by” : “field”, “field” : “priority”, “mode” : “max”, “missing” : “last”, “type”: “number” },
{ “by” : “field”, “field” : “modifiedOn”, “mode” : “max”, “missing” : “last”, “type”: “number”, “desc”:TRUE } ],
“query”: {“conjuncts”:
[ {“field”:“items”, “match”: “a2”},
{“field”:“tenantId”, “match”:“ecommerce”},
{“field”:“clientId”, “match”:“6b643e3ff209”},
{“field”:“useCase.name”, “match”:“useCase2”}
]
}
},
{“index”:“groupTestIndex”, “out” : “filteredUseCase”}
)
AND doc.type = “group”)
SELECT filteredDoc.entry.fields.useCase.externalIds AS externalIds,
filteredDoc.entry.id AS groupId FROM results AS filteredDoc;

Query Result
[
{
“externalIds”: [
“wb-98bded7a-95a0-4120-8f1d-007db7c30728”,
“wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd”
],
“groupId”: “dc26445b-e1d0-42be-b9a3-d0e18aa63ddf”
}
]

I am expecting the result to come as only

[
{
“externalIds”: [
“wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd”
],
“groupId”: “dc26445b-e1d0-42be-b9a3-d0e18aa63ddf”
}
]

Currently , It is correctly matching the doc and giving out the useCase.externalIds as collection of all externalIds array under useCase array, while my flow is to only return those externalIds values where the usecase matched which was passed in the query.

Thanks
Harinder Singh

vsr1 · January 8, 2022, 1:57pm

Query seems right. Following is simplified one.

SELECT smeta.id AS groupId, smeta.fields.useCase.externalIds
FROM `bucket-a` AS doc
LET smeta = SEARCH_META(filteredUseCase)
WHERE SEARCH(doc,
             { "size" : 1,
               "fields" : ["*"],
               "sort": [ { "by" : "field", "field" : "priority", "mode" : "max", "missing" : "last", "type": "number" },
                         { "by" : "field", "field" : "modifiedOn", "mode" : "max", "missing" : "last", "type": "number", "desc":TRUE } ],
                "query": {"conjuncts": [ {"field":"items", "match": "a2"},
                                         {"field":"tenantId", "match":"ecommerce"},
                                         {"field":"clientId", "match":"6b643e3ff209"},
                                         {"field":"useCase.name", "match":"useCase2"}
                                       ]
                          }
              },
              {"index":"groupTestIndex", "out" : "filteredUseCase"}
             )
         AND doc.type = "group";

@abhinav will able to help in results match/index definition

Also try add name as stored field

FIRST v.externalIds FOR v IN smeta.fields.useCase WHEN v.name = “useCase2” END

abhinav · January 8, 2022, 3:24pm

@kaintharinder Your index definition looks fine to me.
FTS flattens JSON arrays. Take the “useCase” field in your sample doc for example. It’s value is an array which has 2 objects within it. Now when you add useCase as a childMapping and name, externalIds as child fields within it, assuming you’re using the “keyword” analyzer - FTS registers 2 fields which are searchable as …

useCase.name         ...   ["useCase1", "useCase2"]
useCase.externalIds  ...   ["wb-98bded7a-95a0-4120-8f1d-007db7c30728", "wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd"]

FTS only returns document IDs as hits and every “stored” field content for the document ID if requested.
FTS does NOT offer a way (yet) to identify index positions within arrays, i.e if you search for useCase2 only written stored content wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd.

@vsr1 would you recommend a way to achieve this in N1QL where SELECT would iterate through the various sub objects within the “useCase” array of the document and obtaining useCase.externalIds of only that object whose useCase.name = “useCase2” for documents returned by the SEARCH() function.

I’m thinking not to index “useCase” at all within FTS, limiting your SEARCH function to just this …

SELECT *
FROM `bucket-a` AS doc
WHERE doc.type = "group"
AND SEARCH(doc,
             { "size" : 1,
               "sort": [ { "by" : "field", "field" : "priority", "mode" : "max", "missing" : "last", "type": "number" },
                         { "by" : "field", "field" : "modifiedOn", "mode" : "max", "missing" : "last", "type": "number", "desc":TRUE } ],
               "query": {"conjuncts": [ {"field":"items", "match": "a2"},
                                        {"field":"tenantId", "match":"ecommerce"},
                                        {"field":"clientId", "match":"6b643e3ff209"}
                                      ]
                        }
             },
             {"index":"groupTestIndex", "out" : "filteredUseCase"}
          );

vsr1 · January 8, 2022, 3:37pm

Based on @abhinav response store name also and try following

SELECT smeta.id AS groupId,  ARRAY smeta.fields.useCase.externalIds[pos] FOR pos:v IN smeta.fields.useCase.name WHEN v = "useCase2" END AS externalIds
FROM `bucket-a` AS doc
LET smeta = SEARCH_META(filteredUseCase)
WHERE SEARCH(doc,
             { "size" : 1,
               "fields" : ["*"],
               "sort": [ { "by" : "field", "field" : "priority", "mode" : "max", "missing" : "last", "type": "number" },
                         { "by" : "field", "field" : "modifiedOn", "mode" : "max", "missing" : "last", "type": "number", "desc":TRUE } ],
                "query": {"conjuncts": [ {"field":"items", "match": "a2"},
                                         {"field":"tenantId", "match":"ecommerce"},
                                         {"field":"clientId", "match":"6b643e3ff209"},
                                         {"field":"useCase.name", "match":"useCase2"}
                                       ]
                          }
              },
              {"index":"groupTestIndex", "out" : "filteredUseCase"}
             )
         AND doc.type = "group";

kaintharinder · January 8, 2022, 4:42pm

Thanks @vsr1 and @abhinav for your suggestions , I forgot to mention one thing the number of arguments under externalIds can vary under each usecase like first useCase1 can have 3 externalIds under the array while the second can have only one.Position I guess will map the position of externalIds-name one to one while the relation between usecase and externalIds is one to many

@abhinav … is it possible to store useCase.externals with index as well without flattening something like below

“useCase.externalIds”: [

               [ "wb-98bded7a-95a0-4120-8f1d-007db7c30728",
                "wb-70a1b5c2-b14e-4a37-8d3f-49ec55e30e86",
                ],
               ["wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd"]
            ]

abhinav · January 8, 2022, 5:31pm

@kaintharinder Unfortunately not.

FTS treats an array of strings as text, where analysis is applied separately on each element of the array (and sub array).
So if your JSON field contained …

"useCase.externalIds": [
  [
    "wb-98bded7a-95a0-4120-8f1d-007db7c30728",
    "wb-70a1b5c2-b14e-4a37-8d3f-49ec55e30e86",
    
  ],
  [
    "wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd"
  ]
]

FTS would apply analysis on each sub element of the array by flattening out the object to …

"useCase.externalIds": [
  "wb-98bded7a-95a0-4120-8f1d-007db7c30728",
  "wb-70a1b5c2-b14e-4a37-8d3f-49ec55e30e86",
  "wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd"
]

So your approach to getting what you need is following @vsr1 's suggestion from above - where you use N1QL to iterate over the elements of the array but NOT by storing the field within the FTS index.

Something like this …

SELECT groupId, useCase.externalIds[pos] FOR pos:v IN useCase.name WHEN v = "useCase2" END AS externalIds
FROM `bucket-a` AS doc
WHERE doc.type = "group"
AND SEARCH(doc,
             { "size" : 1,
               "sort": [ { "by" : "field", "field" : "priority", "mode" : "max", "missing" : "last", "type": "number" },
                         { "by" : "field", "field" : "modifiedOn", "mode" : "max", "missing" : "last", "type": "number", "desc":TRUE } ],
               "query": {"conjuncts": [ {"field":"items", "match": "a2"},
                                        {"field":"tenantId", "match":"ecommerce"},
                                        {"field":"clientId", "match":"6b643e3ff209"}
                                      ]
                        }
             },
             {"index":"groupTestIndex"}
          );

kaintharinder · January 8, 2022, 7:11pm

@abhinav

“useCase.externalIds”: [
“wb-98bded7a-95a0-4120-8f1d-007db7c30728”,
“wb-70a1b5c2-b14e-4a37-8d3f-49ec55e30e86”,
“wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd”
]

“useCase.name”: [
“useCase1”,
“useCase2”
]

Need some mapping between useCase1 and array of exrternalIds, like first two can be mapped to useCase1, or first and third can have mapping to useCase1. The position query maps useCase.name[0] - useCase.externalIds[0] and so on.

while useCase.name[0] - can map to multiple positions under useCase.externalIds

Json Document Mapping
“useCase”: [
{
“disabled”: true,
“externalIds”: [
“wb-98bded7a-95a0-4120-8f1d-007db7c30728”,
“wb-70a1b5c2-b14e-4a37-8d3f-49ec55e30e86”
],
“label”: “Use Case 1”,
“name”: “useCase1”
},
{
“disabled”: true,
“externalIds”: [
“wb-c335aa93-c8b4-40d9-9f50-ba01041d5dcd”
],
“label”: “Use Case 2”,
“name”: “useCase2”
}
]

abhinav · January 8, 2022, 7:20pm

@kaintharinder Sorry if I wasn’t clear before. I meant - to achieve what you’re looking for here, I recommend using N1QL directly on the content retrieved from Couchbase server (KV store) as opposed to retrieving stored content from the FTS index. This way N1QL would directly work with the JSON document mapping by using the document IDs filtered by the FTS index.

The query I shared in my previous comment does that.

vsr1 · January 8, 2022, 7:41pm

In that case only option is fetch the document use N1QL expression like this (FTS Search + N1QL)
NOTE: Removed “fields” : ["*"], in SEARCH()

SELECT META(doc).id AS groupId,
       ARRAY_FLATTEN(ARRAY v.externalIds FOR v IN doc.useCase WHEN v.name = "useCase2" END,1) AS externalIds
FROM `bucket-a` AS doc
WHERE doc.type = "group"
AND SEARCH(doc,
             { "size" : 1,
               "sort": [ { "by" : "field", "field" : "priority", "mode" : "max", "missing" : "last", "type": "number" },
                         { "by" : "field", "field" : "modifiedOn", "mode" : "max", "missing" : "last", "type": "number", "desc":TRUE } ],
                "query": {"conjuncts": [ {"field":"items", "match": "a2"},
                                         {"field":"tenantId", "match":"ecommerce"},
                                         {"field":"clientId", "match":"6b643e3ff209"},
                                         {"field":"useCase.name", "match":"useCase2"}
                                       ]
                          }
              },
              {"index":"groupTestIndex", "out" : "filteredUseCase"}
             );

kaintharinder · January 9, 2022, 10:10am

Thanks @vsr1 and @abhinav for your help.

SELECT META(doc).id AS groupId,
       ARRAY_FLATTEN(ARRAY v.externalIds FOR v IN doc.useCase WHEN v.name = "useCase1" END,1) AS externalIds
FROM `tesla-preview` AS doc
WHERE doc.type = "group"
AND SEARCH(doc,
             { "size" : 1,
               "sort": [ { "by" : "field", "field" : "priority", "mode" : "max", "missing" : "last", "type": "number" },
                         { "by" : "field", "field" : "modifiedOn", "mode" : "max", "missing" : "last", "type": "number", "desc":TRUE } ],
                "query": {"conjuncts": [ {"field":"items", "match": "a2"},
                                         {"field":"tenantId", "match":"ecommerce"},
                                         {"field":"clientId", "match":"6b643e3ff209"},
                                         {"field":"useCase.name", "match":"useCase1"}
                                       ]
                          }
              },
              {"index":"groupTestIndex"}
             );

Changed doc.name to v.name and the query gave expected result.