CBL 2.6 with wss:// fails, works with https://

Setup:
Android 9
nginx/1.10.3 (Ubuntu)
Sync Gateway/2.6.0(127;b4c828d) CE

Android client connects to nginx box which in turn forwards requests to one (of several) SG boxes. When the client connects to https://mydomain.com:4984/ all is working fine. Connecting to wss://mydomain.com:4984/ displays these two error messages in Android’s logcat.

E/CouchbaseLite/NETWORK: {N8litecore4repl12C4SocketImplE#1}==> N8litecore4repl12C4SocketImplE wss://mydomain.com:4984/_blipsync @0x79406dbd70
E/CouchbaseLite/NETWORK: {N8litecore4repl12C4SocketImplE#1} No response received after 15 sec -- disconnecting

Turning the firewall of the nginx box off and connecting again the error message is:

W/CouchbaseLite/NETWORK: C4Socket.open() socket -> 520700289152
W/CouchbaseLite/NETWORK: C4Socket.open() clazz -> com.couchbase.lite.internal.replicator.CBLWebSocket
E/CouchbaseLite/NETWORK: CBLWebSocket.socket_open()
W/CouchbaseLite/NETWORK: WebSocketListener.onFailure() response -> null: java.net.ConnectException: Failed to connect to mydomain.com/<IP>:4984
W/CouchbaseLite/NETWORK: C4Socket.dispose() handle -> 520700289152
E/CouchbaseLite/REPLICATOR: {Repl#4}==> N8litecore4repl10ReplicatorE /data/user/0/com.my.app/files/couchbase_database.cblite2/ ->wss://mydomain.com:4984/_blipsync @0x794093b5c8
E/CouchbaseLite/REPLICATOR: {Repl#4} Got LiteCore error: POSIX error 111 "Connection refused"
W/System.err: CouchbaseLiteException{POSIXErrorDomain,111,'Connection refused'}
W/System.err:     at com.couchbase.lite.CBLStatus.convertException(CBLStatus.java:11)
W/System.err:     at com.couchbase.lite.CBLStatus.convertException(CBLStatus.java:1)
W/System.err:     at com.couchbase.lite.CBLStatus.convertException(CBLStatus.java:4)
W/System.err:     at com.couchbase.lite.AbstractReplicator.updateStateProperties(AbstractReplicator.java:5)
W/System.err:     at com.couchbase.lite.AbstractReplicator.c4StatusChanged(AbstractReplicator.java:19)
W/System.err:     at com.couchbase.lite.AbstractReplicator$ReplicatorListener.a(AbstractReplicator.java:1)
W/System.err:     at com.couchbase.lite.i.run(Unknown Source:4)
W/System.err:     at com.couchbase.lite.internal.AndroidExecutionService$SerialExecutor.a(AndroidExecutionService.java:1)
W/System.err:     at com.couchbase.lite.internal.c.run(Unknown Source:4)
W/System.err:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
W/System.err:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
W/System.err:     at java.lang.Thread.run(Thread.java:764)
W/System.err: Caused by: LiteCoreException{domain=2, code=111, msg=Connection refused}
W/System.err:     at com.couchbase.lite.CBLStatus.convertException(CBLStatus.java:3)
W/System.err: 	... 9 more
W/LanguageCookieService: Default CookieHandler was null
W/System.err: CouchbaseLiteException{POSIXErrorDomain,111,'Connection refused'}
W/System.err:     at com.couchbase.lite.CBLStatus.convertException(CBLStatus.java:11)
W/System.err:     at com.couchbase.lite.CBLStatus.convertException(CBLStatus.java:1)
W/System.err:     at com.couchbase.lite.CBLStatus.convertException(CBLStatus.java:4)
W/System.err:     at com.couchbase.lite.AbstractReplicator.updateStateProperties(AbstractReplicator.java:5)
W/System.err:     at com.couchbase.lite.AbstractReplicator.c4StatusChanged(AbstractReplicator.java:19)
W/System.err:     at com.couchbase.lite.AbstractReplicator$ReplicatorListener.a(AbstractReplicator.java:1)
W/System.err:     at com.couchbase.lite.i.run(Unknown Source:4)
W/System.err:     at com.couchbase.lite.internal.AndroidExecutionService$SerialExecutor.a(AndroidExecutionService.java:1)
W/System.err:     at com.couchbase.lite.internal.c.run(Unknown Source:4)
W/System.err:     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
W/System.err:     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
W/System.err:     at java.lang.Thread.run(Thread.java:764)
W/System.err: Caused by: LiteCoreException{domain=2, code=111, msg=Connection refused}
W/System.err:     at com.couchbase.lite.CBLStatus.convertException(CBLStatus.java:3)
W/System.err: 	... 9 more

Troubleshooting so far:

nginx version is new enough that it supports web sockets. SSL cert is from let’s encrypt and I wasn’t able to find any mention that wss needs additional config.
Testing the connection on my dev cluster in LAN had no issues. Nginx config is similar on dev and prod boxes besides ssl configs.
In var/log/nginx/access.log there are many mentions of “CouchbaseLite/1.3 (1.4.4 …” but no mentions of CouchbaseLite/2. No logs in var/log/nginx/error.log

Any pointers are greatly appreciated!

That’s not a valid replication URL — there is no database name in the path.

However, the result should be a 4xx response from SG, not a timeout or connection failure. Maybe the URL violates some kind of filter in nginx and it response by just shutting down the socket for some reason? This looks like an nginx issue, not a Couchbase Mobile issue.

Hi Jens,

I’m sorry I removed the database name when I pasted it here. So it should read wss://mydomain.com:4984/my_database/_blipsync

I made some further progress. First one correction: https works with CBL 1.x not with CBL 2.x. And I found a warning before the error message in my first post. Here it’s together:

2019-10-14 18:10:14.157 W/CouchbaseLite/NETWORK: WebSocketListener.onFailure() response -> null: javax.net.ssl.SSLHandshakeException: Handshake failed
2019-10-14 18:10:14.189 W/CouchbaseLite/NETWORK: C4Socket.dispose() handle -> 521078923072
2019-10-14 18:10:14.191 E/CouchbaseLite/REPLICATOR: {Repl#1}==> N8litecore4repl10ReplicatorE /data/user/0/com.my.app/files/couchbase_database.cblite2/ ->wss://mydomain.com:4984/my_database/_blipsync @0x79403095c8
2019-10-14 18:10:14.199 E/CouchbaseLite/REPLICATOR: {Repl#1} Got LiteCore error: WebSocket error 1001 "WebSocket connection closed by peer"

I got it working on a new remote box with firewall off and only using ws instead of wss. I agree that it fails earlier. I’ll keep digging!

So it seems that nginx’s SSL endpoint and CBL 2’s SSL endpoint don’t like each other and the handshake fails. I don’t know why CBL 1.x would work when 2.x fails … perhaps there is a difference in the SSL implementation being used? @blake.meike, any ideas?

One idea: check what versions of the TLS protocol your nginx supports. It may be that CBL 2 is configured to requires newer versions only, for security reasons. So for example if nginx only goes up to TLS 1.1 while CBL 2 requires TLS 1.2, that would produce symptoms like this.

(It’s too bad Java isn’t giving us more detail on the failure than just Handshake failed… Blake, if there’s more detail available, like “incompatible ciphers”, it would be good to log that or put it in the exception somehow.)

Good idea! I tested it with https://www.ssllabs.com and got following results. TLS 1.2 is supported.

Good idea about fixing the error message. Definitely will do that.

@benjamin_glatzeder : did you attend to this:

I got this:

<?xml version="1.0" encoding="utf-8"?>
<network-security-config>
    <base-config cleartextTrafficPermitted="true">
        <trust-anchors>
            <certificates src="system"/>
        </trust-anchors>
    </base-config>
</network-security-config>

I’ll need to read up on <certificates src=system"/>. But the certificate is from Let’s Encrypt and automatically created with their tools. No customization on my part.

Solved it! I had wss://mydomain.com:4984/my_database. This must have worked on my dev cluster since nginx and sync gateway run on the same box. There is also no firewall running. Solution is to use the URI like so: wss://mydomain.com/my_database without the port number. This is handled by nginx. Thanks @jens and @blake.meike for troubleshooting ideas!