FTS sorting with keyword analyzer is broken for many words

I want to sort a dataset by one field. Since I do not want to apply any transformations like stemming to the value before sorting, I’ve choosen to use the keyword analyzer which leaves the field value unchanged. This works for most cases but there are lots of words that are always sorted wrong. They seem to not have been index properly.


How to reproduce

Create some data:

INSERT INTO `Test`(KEY, VALUE) VALUES
("a", {"a": "Aaa" }),
("b", {"a": "Bbb" }),
("c", {"a": "Ccc" }),
("d", {"a": "Ddd" }),
("e", {"a": "Eee" }),
("f", {"a": "Fff" }),
("g", {"a": "Ggg" }),
("h", {"a": "Hhh" }),
("i", {"a": "Iii" }),
("j", {"a": "Jjj" }),
("k", {"a": "Kkk" }),
("l", {"a": "Lll" }),
("m", {"a": "Mmm" }),
("n", {"a": "Nnn" }),
("o", {"a": "Ooo" }),
("p", {"a": "Ppp" }),
("q", {"a": "Qqq" }),
("r", {"a": "Rrr" }),
("s", {"a": "Sss" }),
("t", {"a": "Ttt" }),
("u", {"a": "Uuu" }),
("v", {"a": "Vvv" }),
("w", {"a": "Www" }),
("x", {"a": "Xxx" }),
("y", {"a": "Yyy" }),
("z", {"a": "Zzz" });

Now create a fulltext index on the data and analyze the a field with the “keyword” analyzer:
Json index definition

Finally issue a query towards the FTS index:

curl --request POST 'http://localhost:8094/api/index/test/query' \
--header 'Authorization: Basic <...>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "size": 30,
  "fields": [ "*" ],
  "sort": [ "a" ],
  "query": {
    "query": "a:*"
  }
}'

The result will contain most words sorted in alphabetical order. Just some words have ���������� in their sort array and are not sorted properly. According to the documentation this means that the field is not indexed:

{
  "status": {
    "total": 6,
    "failed": 0,
    "successful": 6
  },
  "request": {
    "query": {
      "query": "a:*"
    },
    "size": 30,
    "from": 0,
    "highlight": null,
    "fields": [
      "*"
    ],
    "facets": null,
    "explain": false,
    "sort": [
      "a"
    ],
    "includeLocations": false,
    "search_after": null,
    "search_before": null
  },
  "hits": [
    {
      "index": "test_439eb73d2e79c6c4_f4e0a48a",
      "id": "a",
      "score": 0.17139825364493735,
      "sort": [
        "Aaa"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_54820232",
      "id": "b",
      "score": 0.21164339756999317,
      "sort": [
        "Bbb"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_18572d87",
      "id": "c",
      "score": 0.17139825364493735,
      "sort": [
        "Ccc"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_f4e0a48a",
      "id": "d",
      "score": 0.17139825364493735,
      "sort": [
        "Ddd"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_6ddbfb54",
      "id": "e",
      "score": 0.21164339756999317,
      "sort": [
        "Eee"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_18572d87",
      "id": "f",
      "score": 0.17139825364493735,
      "sort": [
        "Fff"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_54820232",
      "id": "g",
      "score": 0.21164339756999317,
      "sort": [
        "Ggg"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_54820232",
      "id": "h",
      "score": 0.21164339756999317,
      "sort": [
        "Hhh"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_aa574717",
      "id": "i",
      "score": 0.21164339756999317,
      "sort": [
        "Iii"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_6ddbfb54",
      "id": "j",
      "score": 0.21164339756999317,
      "sort": [
        "Jjj"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_13aa53f3",
      "id": "k",
      "score": 0.21164339756999317,
      "sort": [
        "Kkk"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_aa574717",
      "id": "l",
      "score": 0.21164339756999317,
      "sort": [
        "Lll"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_f4e0a48a",
      "id": "m",
      "score": 0.17139825364493735,
      "sort": [
        "Mmm"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_13aa53f3",
      "id": "n",
      "score": 0.21164339756999317,
      "sort": [
        "Nnn"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_18572d87",
      "id": "o",
      "score": 0.17139825364493735,
      "sort": [
        "Ooo"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_aa574717",
      "id": "p",
      "score": 0.21164339756999317,
      "sort": [
        "Ppp"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_f4e0a48a",
      "id": "q",
      "score": 0.17139825364493735,
      "sort": [
        "Qqq"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_6ddbfb54",
      "id": "y",
      "score": 0.21164339756999317,
      "sort": [
        "Yyy"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_18572d87",
      "id": "z",
      "score": 0.17139825364493735,
      "sort": [
        "Zzz"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_f4e0a48a",
      "id": "x",
      "score": 0.17139825364493735,
      "sort": [
        "����������"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_6ddbfb54",
      "id": "v",
      "score": 0.21164339756999317,
      "sort": [
        "����������"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_13aa53f3",
      "id": "r",
      "score": 0.21164339756999317,
      "sort": [
        "����������"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_13aa53f3",
      "id": "w",
      "score": 0.21164339756999317,
      "sort": [
        "����������"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_54820232",
      "id": "t",
      "score": 0.21164339756999317,
      "sort": [
        "����������"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_aa574717",
      "id": "u",
      "score": 0.21164339756999317,
      "sort": [
        "����������"
      ]
    },
    {
      "index": "test_439eb73d2e79c6c4_18572d87",
      "id": "s",
      "score": 0.17139825364493735,
      "sort": [
        "����������"
      ]
    }
  ],
  "total_hits": 26,
  "max_score": 0.21164339756999317,
  "took": 9966662,
  "facets": null
}

Is this on purpose and I’ve misconfigured / misunderstood something in FTS or is this a bug?
I’ve tried this on Couchbase 6.5.1 enterprise and community.

Hi @aki,

This is sort of a known issue and I would suggest you try the “Sort by Object” option.
ref - https://docs.couchbase.com/server/current/fts/fts-sorting.html#fts-advanced-sort-options
You may try the sort field like below,

"sort": [
      {
       "by" : "field",
       "field" : "a",
       "mode" : "max",
       "missing" : "last",
       "type": "string"
      }

The trick is you should explicitly specify the type of the sort field here over the type field.
Since the field “a” is of text type, let’s set the “type” as string in this case.

Cheers!

1 Like