N1QL Query times out using parametrized values in IN clause

mlblount45 · September 11, 2016, 11:03pm

@geraldss no the underlying problem still remains which is ::when running this query from the Java SDK it’s extremely slow. I’m just really confused because the same query runs quickly via CBQ and via cURL. I have 3 other queries which are executed in the same environments which run fine. All four queries always run together so any external issues such as network should effect all of them. Out of curiosity I ran this query alone without the other 3 it is coupled to and the problem remains. In an environment with about 20k docs it takes around 4 seconds in my environment with 11 million it’s timing out. I’ve tried the following SDK versions 2.2.7, 2.2.8, 2.3.3 even upgraded from couchbase 4.1 to 4.5

geraldss · September 11, 2016, 11:33pm

Hi @prasad @simonbasle @daschl.

Please take a look at this. The user is seeing good performance via HTTP and slowdown via Java SDK.

simonbasle · September 12, 2016, 8:45am

@mlblount45 can you share the code you’re using for that query? is it still the count(*) query from your original post?

I’m also interested in the way you open the Bucket object from which you run the query (since it’s a different bucket from the one you’re using).

Have you been running curl from a machine inside the cluster, or from the client instance machine?

Lastly, it could be useful if you could provide an example of what the N1QL response looks like in cbq, see if there is something in there that could make the parser fail ):

mlblount45 · September 15, 2016, 7:18am

This was moved to a private discussion but here is the Gist…

From me
"Looks like I may have found the Issue … it has to do with the parameterization of the values in the IN clause I made [changes hardcoding the in clause values] and the query was able to run as fast as CBQ. is my only option to drop parameterizing? as I said before I even get this bug in sdk 2.3.3 so its there on latest as well…"

From simon
if you saw that behavior with curl, then it is not a SDK bug (which relieves me ) but rather something on the server side, either on the index service or query service. So it’s not unexpected that even the latest SDK would display the bug.

good news is you have a workaround (don’t use parameterized for IN clause).

maybe we can move the discussion back into the public topic? report there that you found the IN clause in a parameterized query was causing a timeout, even when querying via curl?

_prasad _geraldss _keshav_m it sounds like a bug to me doesn’t it?

From marcog
"the underlying issue is that once you put the IN values in a parameter, the optimiser can’t assume the IN clause only has one value.
Since experienceId leads the index, we can only use an equality to travel down the index and start counting the keys that apply because we would have to skip and scan.
I would suggest creating the index as (docType, publishTimestamp, parentMsgId), that way we can still scan the index on the conditions set on docType, publishTimestamp, and apply the filter on experienceId later on in the pipeline."

Final outcome:
I took marcos suggestion and moved the parentMsgId to the end of the indexed fields. This bought me a small performance but in order to solve the bigger problem I had to remove the parameterization from the IN clause. This Optimized my queries from about 700 Seconds to 16 seconds.

geraldss · September 15, 2016, 5:55pm

What is your target performance number? Let’s get it to that…