Query Performance based on where clause

aponnath · August 19, 2022, 5:00pm

I have a situation where one of my data providers did a crappie job in providing clean data. In this case i get an APN in the data feed either as a 8 digit string ParcelNumber = ‘64944129’ or as a formatted string ParcelNumber = ‘649-441-29’.

So my question is can I build a index with a replace clause like REPLACE(ParcelNumber, “-”,“”) which would remove all dashes and then let me search by the clean string.

The other option is where ( ParcelNumber = ‘64944129’ or ParcelNumber = ‘649-441-29’)

Is there any advantage of using either over the other ?

vsr1 · August 19, 2022, 6:19pm

CREATE INDEX ix1 ON mybucket ( REPLACE(ParcelNumber,"-",""));
SELECT *
FROM mybucket
WHERE REPLACE(ParcelNumber,"-","") = "64944129";

OR

SELECT *
FROM mybucket
WHERE REPLACE(ParcelNumber,"-","") = REPLACE($userinput,"-","");
 -- $userinput can be "64944129" OR "649-44-129"

aponnath · August 19, 2022, 6:23pm

Is there any major performance hit if i use the replace index ? I tested my or clause and its really fast but requires me to maintain the code on my app to always call with 2 values. The replace on the Server would make it simpler to maintain

vsr1 · August 19, 2022, 6:34pm

It should n’t cause any performance impact. Also following will work

WHERE ParcelNumber IN ["64944129", "649-44-129"];

nzlconcerto · August 22, 2022, 10:08am

So, it’s not a great idea to use function indexes. This implies business logic is being held in the back end.

However, you are where you are:

I would strongly suggest doing a data fix, but I have no idea about the volumes of affected data.
I assume you already have an index on the “ParcelNumber” field, therefore WHERE ParcelNumber IN ["64944129", "649-44-129"] on the client side query is suffice.

This solution is not ideal from a business perspective. Going forward for perpetuity, this is now a new business rule, the side effect is that any new query touching the “ParcelNumber” field will have to be written using the “in” statement. Good luck on getting the devs to remember this rule over time. And that your test environments needs to reflect this new business rule.