Random IO error using CE 4.5

kev · October 20, 2017, 8:07pm

[Fri Oct 20 15:52:56.962760 2017] [php7:notice] [pid 6829] [client 127.0.0.1:42424] [cb,EROR] (http-io L:207 I:0) <rd.cb2.example.com:8093>Got error while performing I/O on HTTP stream. Err=0x2e
[Fri Oct 20 15:52:56.962888 2017] [php7:notice] [pid 6829] [client 127.0.0.1:42424] [cb,INFO] (connection L:465 I:0) <rd.cb2.example.com:8093> (SOCK=0x5606ff091aa0) Starting. Timeout=75000000us
[Fri Oct 20 15:52:57.339802 2017] [php7:notice] [pid 6829] [client 127.0.0.1:42424] [cb,INFO] (connection L:139 I:0) <rd.cb2.example.com:8093> (SOCK=0x5606ff091aa0) Connected established
[Fri Oct 20 15:52:57.795874 2017] [php7:notice] [pid 6829] [client 127.0.0.1:42424] [cb,EROR] (http-io L:351 I:0) <rd.cb2.example.com:8093> Failed to add redirect URL (om:8093/query/service/query/service)
[Fri Oct 20 15:52:57.795965 2017] [php7:notice] [pid 6829] [client 127.0.0.1:42424] [cb,WARN] (pcbc/n1ql L:51) Failed to decode N1QL row as JSON: json_last_error=4. I=0x5606ff0750f0
[Fri Oct 20 15:52:57.796200 2017] [php7:notice] [pid 6829] [client 127.0.0.1:42424] [cb,EROR] (pcbc/n1ql L:88) Failed to perform N1QL query. 301: . I=0x5606ff0750f0
[Fri Oct 20 15:52:57.796243 2017] [php7:notice] [pid 6829] [client 127.0.0.1:42424] [cb,INFO] (lcbio_mgr L:463 I:0) <rd.cb2.example.com:8093> (HE=0x5606ff092630) Placing socket back into the pool. I=0x5606ff0ad580,C=0x5606ff091aa0

This is what I get randomly when sending N1QL insert commands to my cluster. The issue is intermittent and cannot be easily reproduced. I verified the json being sent when this error occurred, and it was formatted correctly without any syntax errors.

Is it just that the IO error sent a partial part of the json which then caused a failure? Why would an error like this occur?

vsr1 · October 20, 2017, 8:14pm

Do you have exact error return by query service? Is there any errors in query.log

avsej · October 20, 2017, 8:16pm

it seems like N1QL service returns 301 Redirect with empty payload and the client cannot handle body as JSON. I will look at this deeper to come up with reproduction and the fix.

Are there changes to the cluster state at that moment (like rebalance, failover, other failures)?

kev · October 20, 2017, 8:33pm

All the errors I have I have provided. I checked the CB log page in the CB dashboard, but saw nothing of interest there.

There were no failovers, or rebalances. query.log also has no real interesting info in there during the failure.

johan_larson · October 22, 2017, 12:03pm

I don’t see an obvious problem. Are you running 4.5.1?

At this point, I think your best bet is to try to reproduce the problem on a minimal setup: one machine, simple data, simple client. If you got that it would let us repro the problem here and debug it.

avsej · October 22, 2017, 5:32pm

could you get tcpdump capture of the traffic on port 8093?