Query Performance based on where clause

I have a situation where one of my data providers did a crappie job in providing clean data. In this case i get an APN in the data feed either as a 8 digit string ParcelNumber = ‘64944129’ or as a formatted string ParcelNumber = ‘649-441-29’.

So my question is can I build a index with a replace clause like REPLACE(ParcelNumber, “-”,“”) which would remove all dashes and then let me search by the clean string.

The other option is where ( ParcelNumber = ‘64944129’ or ParcelNumber = ‘649-441-29’)

Is there any advantage of using either over the other ?

CREATE INDEX ix1 ON mybucket ( REPLACE(ParcelNumber,"-",""));
SELECT *
FROM mybucket
WHERE REPLACE(ParcelNumber,"-","") = "64944129";

OR

SELECT *
FROM mybucket
WHERE REPLACE(ParcelNumber,"-","") = REPLACE($userinput,"-","");
 -- $userinput can be "64944129" OR "649-44-129"

Is there any major performance hit if i use the replace index ? I tested my or clause and its really fast but requires me to maintain the code on my app to always call with 2 values. The replace on the Server would make it simpler to maintain

It should n’t cause any performance impact. Also following will work

WHERE ParcelNumber IN ["64944129", "649-44-129"];

So, it’s not a great idea to use function indexes. This implies business logic is being held in the back end.

However, you are where you are:

  1. I would strongly suggest doing a data fix, but I have no idea about the volumes of affected data.
  2. I assume you already have an index on the “ParcelNumber” field, therefore WHERE ParcelNumber IN ["64944129", "649-44-129"] on the client side query is suffice.

This solution is not ideal from a business perspective. Going forward for perpetuity, this is now a new business rule, the side effect is that any new query touching the “ParcelNumber” field will have to be written using the “in” statement. Good luck on getting the devs to remember this rule over time. And that your test environments needs to reflect this new business rule.