I am seeking a better understanding why different scores would be returned for identical terms.
3-node 6.0.4 Cluster
Node SDK 2.6.9 AND REST API
Searching against index on “name” field in “entity” type docs:
Search term: “YAMA AUTOMOTIVE”
Results returned include nearly 100 matches, all with “xxx AUTOMOTIVE” contained within the name value. Many of the matches return names containing identical spelling to one another. For example:
Given what we know about our data, we expected this sampling:
- 5 matches bear the name “JUMBO AUTOMOTIVE”
- 2 bear the name “SLIMS AUTOMOTIVE”.
- 0 matches expected to match our search term identically or even highly similarly
What was returned was not a surprise. That is, the results include 5, 2, & 0 matches respectively of those above.
What we did not expect was a wide variance between identically named terms.
“SLIMS AUTOMOTIVE” was the 6th ranked term with a 2.6122 score. However, the term is found a second time ranked 48th with a 1.8993 score.
We are confident there is a reason to explain this but we would like to know what that is. Most, importantly, we need to know how to address our query where makes more sense to the user of the query. This is especially troublesome when the user is expecting the same results against the same set of data and find some terms near the edge of the cut-off point appear to return intermittently.
We know we can sort these by name but that defeats the purpose of the scoring value. We expected identical terms to have largely matching numeric values, even if not perfect.
Thank you for your assistance.