Search Node not stable

#21

I configured to 10 seconds sreeks. But still I am seeing logs. I think these logs are the heartbeat logs (on 20 sec interval) showing node is inactive . Should I take on a serious note sreeks and daschl ?

-Divya

#22

@divya.garg121 how are you running the client? on a different EC2 node? anything in between which could cut off connections?

#23

I am seeing this logs from client

  1. on local machine.
  2. on Ec2 instance as well. The difference is cluster is set up on WEST zone and client is set up on EAST zone.
  3. Just set up client (with default settings) on same zone as where cluster is and still seeing the issue.

Thanks

#24

I did telnet to search node from seach node inself:

Thu Feb 1 12:10:25 UTC 2018
[ec2-user@ip-172-31-43-166 ~]$ telnet ec2-34-210-71-83.us-west-2.compute.amazonaws.com 8094
Trying 172.31.43.166…
Connected to ec2-34-210-71-83.us-west-2.compute.amazonaws.com.
Escape character is ‘^]’.
Connection closed by foreign host.
[ec2-user@ip-172-31-43-166 ~]$ date
Thu Feb 1 12:10:49 UTC 2018

Search node socket should have timeout of 20 sec. which should be increased to 2 minutes for me. Is there a way to do it ?

-Divya

#25

Can someone confirm 20 sec timeout is default setting given by CB for search node ?

-Divya

#26

Hey Divya, Nope. There is no 20sec timeout from the Search / FTS service side. [AWS side/node may be]
But you already mentioned above that it is seen with other nodes as well, that clarifies this point.

You may just try other n/w commands Or telnet to other ports and confirm that its not with the FTS service alone.
Please check the AWS TCP connection timeouts…It looks like this time out of 20sec is getting enforced from outside?

#27

As I already mentioned this looks a heartbeat message and coming only if when search node is idle. Let me talk internally should we work on this issue further. Will let you know guys.
Thanks for the support!
Divya Garg

#28

Sure, Please do… (one of the purpose of any heart beat message would be to convey the liveliness of the connection to other other entity and there by prevent any connection closure so that any future connection re use is possible.)
Cheers!

#29

Hi Sreeks,
I mentioned about other nodes but that issue was different from this issue which I haven’t seen recently.
Lets focus on this one.
I telnet to 8091,8092 and 8093 (all in AWS only) but there is no drop after 20 sec. This is only for 8094, search node. And from the AWS side the networking setting we did was only open the firewalls. Not more than that and timeout setting should be same for all the ports in this case. Which shows there must be issue from CB side. Can you check please ?
I have already seen it in my local machine too.
Thanks

#30

Hi Divya, Can you please update me which version of CB you are currently?

#31

Enterprise Edition 5.0.1 build 5003(free version). Thanks

#32

I see,
From 5.0 in Search service, we have a 20 secs configurable ReadTimeOut at the http level, which is is the maximum duration for reading the entire request, including the body. This is meant for dealing with any un trusted clients from holding up the connection for a long time and it is configurable.
So, if you don’t see your queries getting affected by this, you may ignore this for the time being. If you see your query responses getting affected/clipped by this, then we can configure this (“httpReadTimeout”) to a higher value. Or disable it by setting a value 0 for “httpReadTimeout” by parameterising the search process.

Cheers!