FTS Scoring Logic

Hi ,
I’ve just tried FTS node in my local for testing the functionality.
I’ve put some documents (contains name, dob, email) then I give index for that 3 fields.

The top one score is 0.121 and the second one is 0.070 .
From my point of view, the score for the second one should be higher, because 3 matching words instead of 2.

My question:
How’s the logic for this scoring?
Why the score for the second one is lower than the first one?

Thanks

Hi @Han_Chris1,

You expectation is legit and that is how it should be working.
But the score computations happens at an index partition level and tf-idf computations happens at each individual partition level which could create such differences with smaller sets of data.
With larger data sets, these idf differences settles down or become negligent.

I guess you are using the default partition count of 6?
If you change this to 1, then these scoring should work as per your expectations.
Changing the number of partitions is straight forward in recent releases(6.5+). ie can be done from UI.
In older release you need to do this over REST curl commands. Let me know if you need any help there.

You may find the scoring details here - https://docs.couchbase.com/server/current/fts/fts-troubleshooting.html
Also, there is a Show Scoring check box in the search page you shown in your screen shot.

Cheers!

1 Like

Hi @sreeks,

Thanks for your response.
Yes, I’ve just tried to change the total partition to 1 and it works as expected.

So, you mean if my dataset is small, I need to use 1 partition, then if the dataset is large, need to switch to 6 partitions?
Or what’s the best practice for setting this total partition?

Thanks

If you are data set is small and your use case depends on tf-idf scores then a single partitioned index is a possibility. Please note that small/big is a subjective thing depending on the scenario, but a few millions should be reasonable for a single partitioned index.

You might need to revisit this once you start seeing performance/SLAs to meet as partitions helps in parallelising the search/indexing work load.

Total partition settings comes under the cluster scaling/sizing and you might want to reach out to support/solutions team for detailed helps there.

Cheers!

1 Like