SocketTimeoutException on Couchbase Lite Android

connections

#1

On some Android clients, we get timeout exception. Please see the logs below.

The call which I captured in fiddler is :- http://CantTellYouServerName:4984/embarkation/_changes?feed=longpoll&heartbeat=30000&style=all_docs&since=38439

ChangeTracker

{http://CantTellYouServerName:4984/embarkation/, OneShot, @ee57e5}
: Exception in change tracker
java.net.SocketTimeoutException: timeout
at okio.Okio$4.newTimeoutException(Okio.java:227)
at okio.AsyncTimeout.exit(AsyncTimeout.java:284)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:240)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:325)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:314)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:67)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:179)
at okhttp3.RealCall.execute(RealCall.java:63)
at com.couchbase.lite.replicator.ChangeTracker.runLoop(ChangeTracker.java:336)
at com.couchbase.lite.replicator.ChangeTracker.run(ChangeTracker.java:246)
at java.lang.Thread.run(Thread.java:818)
Caused by: java.net.SocketException: Socket closed
at libcore.io.Posix.recvfromBytes(Native Method)
at libcore.io.Posix.recvfrom(Posix.java:189)
at libcore.io.BlockGuardOs.recvfrom(BlockGuardOs.java:250)
at libcore.io.IoBridge.recvfrom(IoBridge.java:549)
at java.net.PlainSocketImpl.read(PlainSocketImpl.java:481)
at java.net.PlainSocketImpl.access$000(PlainSocketImpl.java:37)
at java.net.PlainSocketImpl$PlainSocketInputStream.read(PlainSocketImpl.java:237)
at okio.Okio$2.read(Okio.java:138)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:236)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:325)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:314)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210)
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:67)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)

What could be the reason for this.


#2

can you access this address from your android client’s browser?
may be it is a network problem.


#3

Hi @atom_yang

I can acess the sync gateway url but the complete url takes time.

If i run http://server/db/

The result comes in mili second.
But the complete feed url takes very long …

But this url is also taking time from local host. Which in turn means this url does something else.


#4

can you log * as following in sync_gateway.json

"log": ["*"],

and post the sync gateway’s log?


#5

Verbose log is very difficult to parse. Is it possible to set some specific logging.


#6

yes, you can,FYI

Comma-separated list of log keys to enable for diagnostic logging. Use ["*"] to enable logging for all log keys.
(Default: [“HTTP”])
Acceptable values are:
Access - Anytime an access() call is made in the sync function.
Attach - Attachment processing.
Auth - Authentication.
Bucket - Sync Gateway interactions with the bucket (verbose logging).
Cache - Interactions with Sync Gateway’s in-memory channel cache.
Changes - Processing of _changes requests.
CRUD - Updates made by Sync Gateway to documents.
DCP - DCP-feed processing (verbose logging).
Events - Event processing (webhooks).
Feed - Server-feed processing.
HTTP - All requests made to the Sync Gateway REST APIs (Sync and Admin). Note that the log keyword HTTP is always enabled, which means that HTTP requests and error responses are always logged (in a non-verbose manner). HTTP+ provides more verbose HTTP logging.
Replicate - Log messages related to replications between Sync Gateways (using sg-replicate). This tag cannot be used for replications initiated by Couchbase Lite.

for example

"log": ["HTTP","Feed","Changes","Auth","Access"],

#7

I know a bit about logging but as you asked for sync gateway logs[quote=“atom_yang, post:4, topic:12688, full:true”]
can you log * as following in sync_gateway.json

“log”: ["*"],

and post the sync gateway’s log?
[/quote]

So I said [quote=“pankaj.sharma, post:5, topic:12688”]
Verbose log is very difficult to parse
[/quote]

Hence is there any specific log which can give us info about the issue. Because “*” gives thousand of line which doesn’t look good.


#8

those should be enough.


#9

Hi @atom_yang

Following are some logs

2017-05-03T12:49:24.162Z HTTP: #4570: POST /embarkation/_changes?feed=normal&heartbeat=30000&style=all_docs&since=38439
2017-05-03T12:49:24.162Z Changes: MultiChangesFeed({}, {Since:38439 Limit:0 Conflicts:true IncludeDocs:false Wait:false Continuous:false Terminator:0xc42212f920 HeartbeatMs:30000 TimeoutMs:300000 ActiveOnly:false}) …
2017-05-03T12:49:24.162Z Changes: MultiChangesFeed done
2017-05-03T12:49:24.196Z HTTP: #4571: POST /embarkation/_changes?feed=longpoll&heartbeat=30000&style=all_docs&since=38439
2017-05-03T12:49:24.196Z Changes: MultiChangesFeed({
}, {Since:38439 Limit:0 Conflicts:true IncludeDocs:false Wait:true Continuous:false Terminator:0xc4215a62a0 HeartbeatMs:30000 TimeoutMs:300000 ActiveOnly:false}) …

2017-05-03T12:44:53.881Z HTTP: #4260: POST /embarkation/_changes?feed=longpoll&heartbeat=30000&style=all_docs&since=38439&active_only=true
2017-05-03T12:44:53.882Z Changes: MultiChangesFeed({*}, {Since:38439 Limit:0 Conflicts:true IncludeDocs:false Wait:true Continuous:false Terminator:0xc4257ac780 HeartbeatMs:30000 TimeoutMs:300000 ActiveOnly:true}) …
2017-05-03T12:44:57.846Z Changes: Connection lost from client: (as GUEST)

2017-05-03T12:50:37.382Z HTTP: #4804: POST /embarkation/_changes?feed=longpoll&heartbeat=30000&style=all_docs&since=38439&active_only=true
2017-05-03T12:50:37.382Z Changes: MultiChangesFeed({*}, {Since:38439 Limit:0 Conflicts:true IncludeDocs:false Wait:true Continuous:false Terminator:0xc422e64000 HeartbeatMs:30000 TimeoutMs:300000 ActiveOnly:true}) …

It looks like Connection lost from client can be an issue?


Sync Failing with Large Channel Calculations
#10

it might be.
how about send the request on the SG local host?


#11

But what should we do to handle this in the app?

Because it looks like this is causing sync to halt for some time.

Lets say if this error comes , Then Should we need to establish connection again? And how to do that in Android.

I will check this with local sync gateway tomorrow .


#12

CBLite will auto reconnect. maybe @andy can help to check the log.


#13

Hi @pankaj.sharma

SocketTimeoutException could be caused by the heartbeat messages were not delivered to the Android device.

Background:

  • CBL Android set Socket Timeout with around 40 sec. Socket throws Exception if no data is delivered within 40sec.
  • Sync Gateway sends heartbeat ("\r\n") to the client every 30 sec.
  • If a heartbeat is delivered to the Android, CBL should not throw Exception.

What could cause this?

  • Between SG and CBL, some network error occurs. But Android Socket can not detect the network error.

#14

Hi @hideki

So should we handle this in our code that if there is a timeout exception we should try to reconnect or it is done automatically?

Can you give a best example code of android which handles this.

Regards
Pankaj Sharma


#15

@pankaj.sharma,
The app does not need to handle this Exception. This is the transient error, the replicator automatically retries.


#16

Hi @hideki,

We are getting this error in one shot and that fails the complete sync.

Because we are starting one shot and then once it is stopped we are moving to continuous. In one shot if the response is not received in 40 seconds, what will happen?

Currently it will go in a loop because the replicator will retry it and eventually become offline. But In my scenario its very common that I will get response in more then 40 seconds. Specially when the sync gateway is on AWS.

In 1.2.1 it was 5 mins , why it was moved to 40 seconds. because if

This is true then how one shot for large number of documents will wok. Do I have to stop using one shot ? Or I can increase the timeout setting any how.

Please help @hideki.

Regards
Pankaj


#17

Hi @pankaj.sharma,

This is not an ideal solution. But you could experiment with different timeout value for a read operation. DEFAULT_READ_TIMEOUT of CouchbaseLiteHttpClientFactory.


#18

Hi,

Sorry to dig an old post but I’m facing the same issue using Sync Gateway 1.5 and CBL 1.4.0. I need to perform a one shot replication on 3 different buckets and SocketTimeoutException happen sometimes on one of the replication, causing the one shot to terminate.

@pankaj.sharma Did you finally found a solution to handle it on the client side (Android app)? When looking to https://github.com/couchbase/couchbase-lite-java-core/blob/release/1.4.0/src/main/java/com/couchbase/lite/support/CouchbaseLiteHttpClientFactory.java#L46-L52 as @hideki mentioned I see that the DEFAULT_SO_TIMEOUT_SECONDS is marked as deprecated, is it still useful to go this way?

Restarting the replication might be a solution but I would instead prefer to avoid the exception because

  1. The restart of the replication might cause UX issues (I’m using the changeCount and completedChangeCount to inform the user of the progress of the replication and those would be reseted)
  2. It might happen several time on a single one shot replication.

Side question: do you plan that this issue won’t happen anymore with Couchbase Lite 2.0 and the new replication protocol?

Thank’s


#19

Increase the channel cache which will improve the performance of sync gateway. And it will improve the response time of one shot on sync gateway sent by sync gateway. Current time out is 40 seconds in Android. If it is taking more then 40 seconds even after increasing cache size like I changed it to 100000 then you need to break data into multiple channels and then break the calls. But I guest changing the cache size will improve things.

Although this problem doesn’t come in continuous mode when the network is available. So after all this even if you get the error do note it is only noise and warning not an actuall erro.