Search Node not stable

#1

Hi everyone,
I can see following lines in spring microservice logs every now and then and mostly whenever I hit my FTS query where 54.218.6.25 is my search node:

2018-01-31 10:42:04.791 INFO 18684 — [ cb-io-1-2] com.couchbase.client.core.node.Node : Disconnected from Node 54.218.6.25/ec2-54-218-6-25.us-west-2.compute.amazonaws.com
2018-01-31 10:42:04.860 INFO 18684 — [ cb-io-1-4] com.couchbase.client.core.node.Node : Connected to Node 54.218.6.25/ec2-54-218-6-25.us-west-2.compute.amazonaws.com

I have four node cluster and whenever I remove search node from cluster cluster loooks to be stable as rebalance button gone disable while when I add it it keeps enable and keepds enable even after clicking on it. From command line re -balance shows 100% completed every time. Kindly suggest me some steps to trouble shoot this issue please. Please let me know for further information.

Thanks
Divya Garg

#2

@divya.garg121 which java SDK version are you on?

#3

My CB client version is 2.5.4 as below gradle entry:

compile (‘com.couchbase.client:java-client:2.5.4’)

Please suggest should I set up separate cluster for the search node to resolve this issue? Can anyone share the best practice for the same.
Thanks in advance!
Divya Garg

#4

that should be good. Is it possible that you enable DEBUG logging and share the full log file? I’d like to see whasts going on inside the client when this happens. Thanks :slight_smile:

#5

Please find attached debug log from /opt/couchbase/var/lib/couchbase/logs/debug.log dir. Should I need to enable something explicitly for this ?

-Divyadebug.zip (799 Bytes)

#6

Can someone update on this thread please ?
Thanks

#7

Hi, Not very sure of the exact problem you face at the moment.
-Is your search node responding to search requests after adding it through the rebalance?
-Is the node disconnection affecting your search ? Don’t know the SDK /Client side of this.
But there is a chance that, either side closing the connection after any keep alive/idle time out?

#8

Thanks for the update!
Currently I am running search on 1 million records and working fine functionally.
But I want to make sure time in FTS quries which includes, exact match, wildcard and fuzzy should not be affected by this issue. Let me show you time taken:

Exact or Default Search: within 1 second
Fuzzy Search: within 2 second
Wildcard Search: within 12 second
Blank Search (Match all): within 12 second

Thanks

#9

Hey,
On a quick glance I do see some settings here: https://developer.couchbase.com/documentation/server/current/sdk/java/client-settings.html on the socket keep alive,… etc. May be you may want to experiment with it to see whether it has any impacts on the connection closure issue.

The impact from a new connection establishment may/may not be significant depending on the use cases or the current requirements. But certainly it has a cost too.

Regarding your figures, these again depends on many factors like the system h/w configurations /current load/nature of json data/query types/other storage level advanced configuration etc. Meaning, from such figures we can’t right way classify this as a good/bad performance… But keep monitoring the stats page graphs of the respective index with respect to variations in performance of the above queries - will give you some hints to what all happening at the back ground…and one can track/adjust configs further from these figures.
Cheers!

#10

daschl, did you find the root cause of this issue ? Why these logs are printing in log ?

Thanks
Divya Garg

#11

@divya.garg121 actually I was referring to the debug logs of your application, not the couchbase server. I wanted to get insight into how the SDK is operating, which is sent to your app logger. sorry for the confusion!

#12

Ok. Please find corresponding logs attached. Thanksdebug.zip (1.5 KB)

#13

And those log lines are repeating every 20 seconds. Thanks

#14

Yeah so for some reason the sockets to 8094 get disconnected and then SDK reestablishes it again quickly.

Couple of questions:

  • do you have any failing requests or other service impact from this behavior?

  • How do you configure the CouchbaseCluster/CouchbaseEnvironment (especially around the pool config for search)

  • If you keep the defaults, it might be worth a try by setting for example .searchEndpoints(4) (ignore the deprecation warning for now) on the DefaultCouchbaseEnvironment.Builder and then pass the env into CouchbaseCluster and see if it makes a difference. Like this:

    CouchbaseEnvironment env = DefaultCouchbaseEnvironment
    .builder()
    .searchEndpoints(4)
    .build();

    Cluster cluster = CouchbaseCluster.create(env, “host”);

#15

Thanks for the feedback daschl.
I have not seen failing request till now and am also not making high no of request till now. But I have seen similar disconnect and connect logs in warning mode for other nodes too.
I am using Spring boot way to creating bucket bean and using SearchQuery. These are the entries on my spring application.properties:

spring.couchbase.bootstrap-hosts=four AWS node DNS/IP
spring.couchbase.bucket.name=searchBucket
spring.couchbase.bucket.password=mypassword

Code:
@Autowired
private Bucket bucket;
bucket.query(new SearchQuery(indexName, SearchQuery.match(searchText))

This is my setting. Is this not standered way of using CB clustered environment?
I have four nodes (all data, one search, one index and one query). Please let me know your comments.

Thanks
Divya Garg

#16

yes so thats fine and you are using the defaults. Spring boot allows you to override the default config which should provide you a way to set custom environment params!

#17

yes I am using defaults. This is all provided by spring on timeouts:

COUCHBASE (CouchbaseProperties)

spring.couchbase.env.endpoints.key-value=1 # Number of sockets per node against the Key/value service.
spring.couchbase.env.endpoints.query=1 # Number of sockets per node against the Query (N1QL) service.
spring.couchbase.env.endpoints.view=1 # Number of sockets per node against the view service.
spring.couchbase.env.ssl.enabled= # Enable SSL support. Enabled automatically if a “keyStore” is provided unless specified otherwise.
spring.couchbase.env.ssl.key-store= # Path to the JVM key store that holds the certificates.
spring.couchbase.env.ssl.key-store-password= # Password used to access the key store.
spring.couchbase.env.timeouts.connect=5000 # Bucket connections timeout in milliseconds.
spring.couchbase.env.timeouts.key-value=2500 # Blocking operations performed on a specific key timeout in milliseconds.
spring.couchbase.env.timeouts.query=7500 # N1QL query operations timeout in milliseconds.
spring.couchbase.env.timeouts.socket-connect=1000 # Socket connect connections timeout in milliseconds.
spring.couchbase.env.timeouts.view=7500 # Regular and geospatial view operations timeout in milliseconds.

Let me try increasing socket-connect

Thanks

#18

I did not say increase socket-connect, but fix the number of search endpoints as shown in my example above :slight_smile:

#19

Hi Divya,

As per the link I shared earlier, Socket Keepalive Interval default is 30000ms > 20 sec at which you see the connection break.
Why don’t you configure Socket Keepalive Interval to something less than <20sec… lets say 10 or 5 secs…

#20

I have fixed 4 end points as instructed by daschl and inactive logs now repeating every now and then not on 20 sec period. Find logs attached.debug.zip (3.8 KB)