Required changes from version 6.6 to 7.0?

I have recently updated from version 6.6 to 7.0 (Community Edition) - or rather the servers were automatically updated… - a “gift” during holiday :face_with_raised_eyebrow:). Anyway, things seem to run mostly Ok.

However, I have started seeing issues in my application (web application using Java SDK v.3.1.3 to access Couchbase) where it crashes with (I’ve kept the full stack trace to not leave anything important out):

8/3/21 7:52 AM: Exception Thrown
Context Path: /fangst.nsf
Page Name: /index.xsp
javax.faces.el.EvaluationException: javax.faces.el.EvaluationException: Error getting property 'count' from bean of type dk.dtu.aqua.catchlog.bean.BoastCatchesBean: com.couchbase.client.core.error.RequestCanceledException: QueryRequest, Reason: NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT) {"cancelled":true,"completed":true,"coreId":"0x3df54ed500000009","idempotent":false,"lastDispatchedFrom":"172.23.130.47:6688","lastDispatchedTo":"db1.aqua.dtu.dk:8093","reason":"NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT)","requestId":55375650,"requestType":"QueryRequest","retried":0,"service":{"operationId":"a3279e05-eea7-4580-98cf-80b292cabc39","statement":"SELECT t2.*, t1.* FROM data AS t1 LEFT JOIN data AS t2 ON 'Catch:Private:'||t1.`key` = META(t2).id AND t2.type='Private' WHERE t1.type='Catch' AND (t1.visibility='2')","type":"query"},"timeoutMs":75000}
	at com.sun.faces.el.ValueBindingImpl.getValue(ValueBindingImpl.java:184)
	at com.sun.faces.el.ValueBindingImpl.getValue(ValueBindingImpl.java:134)
	at com.ibm.xsp.page.compiled.ExpressionEvaluatorImpl.getBindingValue(ExpressionEvaluatorImpl.java:151)
	at xsp.BoastCatches$BoastCatchesPage.createPanel2(BoastCatches.java:188)
	at xsp.BoastCatches$BoastCatchesPage.createComponent(BoastCatches.java:114)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.buildComponent(AbstractCompiledPage.java:265)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:115)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.component.UIIncludeComposite.buildPageContents(UIIncludeComposite.java:692)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:340)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.createTree(AbstractCompiledPage.java:256)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.addComponent(AbstractCompiledPage.java:389)
	at com.ibm.xsp.component.UIIncludeComposite.buildContents(UIIncludeComposite.java:453)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:334)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:123)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.component.UIIncludeComposite.buildPageContents(UIIncludeComposite.java:692)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:340)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.createTree(AbstractCompiledPage.java:256)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.addComponent(AbstractCompiledPage.java:389)
	at com.ibm.xsp.component.UIIncludeComposite.buildContents(UIIncludeComposite.java:453)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:334)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildFacet(CompiledComponentBuilder.java:205)
	at com.ibm.xsp.component.UICallback.buildContents(UICallback.java:180)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:334)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:123)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:345)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:123)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:345)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildFacet(CompiledComponentBuilder.java:205)
	at com.ibm.xsp.component.UICallback.buildContents(UICallback.java:180)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:334)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:123)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:345)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:123)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:345)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:123)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.component.UIIncludeComposite.buildPageContents(UIIncludeComposite.java:692)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:340)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.createTree(AbstractCompiledPage.java:256)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.addComponent(AbstractCompiledPage.java:389)
	at com.ibm.xsp.component.UIIncludeComposite.buildContents(UIIncludeComposite.java:453)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:334)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:123)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.component.UIIncludeComposite.buildPageContents(UIIncludeComposite.java:692)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:340)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.createTree(AbstractCompiledPage.java:256)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.addComponent(AbstractCompiledPage.java:389)
	at com.ibm.xsp.component.UIIncludeComposite.buildContents(UIIncludeComposite.java:453)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:334)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildChildren(CompiledComponentBuilder.java:123)
	at com.ibm.xsp.page.compiled.CompiledComponentBuilder.buildAll(CompiledComponentBuilder.java:84)
	at com.ibm.xsp.component.UIViewRootEx.buildContents(UIViewRootEx.java:1649)
	at com.ibm.xsp.component.UIViewRootEx2.buildContents(UIViewRootEx2.java:247)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.initComponent(AbstractCompiledPage.java:334)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.createTree(AbstractCompiledPage.java:256)
	at com.ibm.xsp.page.compiled.AbstractCompiledPage.createViewRoot(AbstractCompiledPage.java:167)
	at com.ibm.xsp.application.ViewHandlerExImpl._createViewRoot(ViewHandlerExImpl.java:521)
	at com.ibm.xsp.application.ViewHandlerExImpl.createViewRoot(ViewHandlerExImpl.java:567)
	at com.ibm.xsp.application.ViewHandlerExImpl.doCreateView(ViewHandlerExImpl.java:142)
	at com.ibm.xsp.application.ViewHandlerEx.createView(ViewHandlerEx.java:90)
	at dk.dtu.aqua.catchlog.control.AppViewHandler.createView(AppViewHandler.java:190)
	at com.ibm.xsp.webapp.FacesServlet.serviceView(FacesServlet.java:250)
	at com.ibm.xsp.webapp.FacesServletEx.serviceView(FacesServletEx.java:157)
	at com.ibm.xsp.webapp.FacesServlet.service(FacesServlet.java:159)
	at com.ibm.xsp.webapp.FacesServletEx.service(FacesServletEx.java:138)
	at com.ibm.xsp.webapp.DesignerFacesServlet.service(DesignerFacesServlet.java:103)
	at com.ibm.designer.runtime.domino.adapter.ComponentModule.invokeServlet(ComponentModule.java:588)
	at com.ibm.domino.xsp.module.nsf.NSFComponentModule.invokeServlet(NSFComponentModule.java:1335)
	at com.ibm.designer.runtime.domino.adapter.ComponentModule$AdapterInvoker.invokeServlet(ComponentModule.java:865)
	at com.ibm.designer.runtime.domino.adapter.ComponentModule$ServletInvoker.doService(ComponentModule.java:808)
	at com.ibm.designer.runtime.domino.adapter.ComponentModule.doService(ComponentModule.java:577)
	at com.ibm.domino.xsp.module.nsf.NSFComponentModule.doService(NSFComponentModule.java:1319)
	at com.ibm.domino.xsp.module.nsf.NSFService.doServiceInternal(NSFService.java:662)
	at com.ibm.domino.xsp.module.nsf.NSFService.doService(NSFService.java:482)
	at com.ibm.designer.runtime.domino.adapter.LCDEnvironment.doService(LCDEnvironment.java:357)
	at com.ibm.designer.runtime.domino.adapter.LCDEnvironment.service(LCDEnvironment.java:313)
	at com.ibm.domino.xsp.bridge.http.engine.XspCmdManager.service(XspCmdManager.java:272)
Caused by: javax.faces.el.EvaluationException: Error getting property 'count' from bean of type dk.dtu.aqua.catchlog.bean.BoastCatchesBean: com.couchbase.client.core.error.RequestCanceledException: QueryRequest, Reason: NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT) {"cancelled":true,"completed":true,"coreId":"0x3df54ed500000009","idempotent":false,"lastDispatchedFrom":"172.23.130.47:6688","lastDispatchedTo":"db1.aqua.dtu.dk:8093","reason":"NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT)","requestId":55375650,"requestType":"QueryRequest","retried":0,"service":{"operationId":"a3279e05-eea7-4580-98cf-80b292cabc39","statement":"SELECT t2.*, t1.* FROM data AS t1 LEFT JOIN data AS t2 ON 'Catch:Private:'||t1.`key` = META(t2).id AND t2.type='Private' WHERE t1.type='Catch' AND (t1.visibility='2')","type":"query"},"timeoutMs":75000}
	at com.sun.faces.el.PropertyResolverImpl.getValue(PropertyResolverImpl.java:104)
	at com.ibm.xsp.el.PropertyResolverImpl.getValue(PropertyResolverImpl.java:144)
	at com.sun.faces.el.impl.ArraySuffix.evaluate(ArraySuffix.java:182)
	at com.sun.faces.el.impl.ComplexValue.evaluate(ComplexValue.java:163)
	at com.sun.faces.el.impl.BinaryOperatorExpression.evaluate(BinaryOperatorExpression.java:175)
	at com.sun.faces.el.impl.ExpressionEvaluatorImpl.evaluate(ExpressionEvaluatorImpl.java:257)
	at com.sun.faces.el.ValueBindingImpl.getValue(ValueBindingImpl.java:150)
	... 83 more
Caused by: com.couchbase.client.core.error.RequestCanceledException: QueryRequest, Reason: NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT) {"cancelled":true,"completed":true,"coreId":"0x3df54ed500000009","idempotent":false,"lastDispatchedFrom":"172.23.130.47:6688","lastDispatchedTo":"db1.aqua.dtu.dk:8093","reason":"NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT)","requestId":55375650,"requestType":"QueryRequest","retried":0,"service":{"operationId":"a3279e05-eea7-4580-98cf-80b292cabc39","statement":"SELECT t2.*, t1.* FROM data AS t1 LEFT JOIN data AS t2 ON 'Catch:Private:'||t1.`key` = META(t2).id AND t2.type='Private' WHERE t1.type='Catch' AND (t1.visibility='2')","type":"query"},"timeoutMs":75000,"timings":{"totalMicros":6843033}}
	at com.couchbase.client.java.AsyncUtils.block(AsyncUtils.java:51)
	at com.couchbase.client.java.Cluster.query(Cluster.java:393)
	at dk.dtu.aqua.catchlog.base.BaseCouchbaseDAO.getAllJoinedSubFromDb(BaseCouchbaseDAO.java:596)
	at dk.dtu.aqua.catchlog.dao.CouchbaseCatchDAO.getCatches(CouchbaseCatchDAO.java:146)
	at dk.dtu.aqua.catchlog.dao.CouchbaseCatchDAO.getAllBoastCatches(CouchbaseCatchDAO.java:159)
	at dk.dtu.aqua.catchlog.dao.facade.CatchCRUDFacade.getAllBoastCatches(CatchCRUDFacade.java:104)
	at dk.dtu.aqua.catchlog.dao.facade.CatchCRUDFacade.getAllBoastCatches(CatchCRUDFacade.java:100)
	at dk.dtu.aqua.catchlog.bean.DataBean.getAllBoastCatches(DataBean.java:477)
	at dk.dtu.aqua.catchlog.bean.BoastCatchesBean.getAllCatches(BoastCatchesBean.java:21)
	at dk.dtu.aqua.catchlog.bean.BoastCatchesBean.getRecentCatches(BoastCatchesBean.java:39)
	at dk.dtu.aqua.catchlog.bean.BoastCatchesBean.getRecentCatches(BoastCatchesBean.java:34)
	at dk.dtu.aqua.catchlog.bean.BoastCatchesBean.getCount(BoastCatchesBean.java:30)
	at sun.reflect.GeneratedMethodAccessor1638.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
	at java.lang.reflect.Method.invoke(Method.java:508)
	at com.sun.faces.el.PropertyResolverImpl.getValue(PropertyResolverImpl.java:96)
	... 89 more
	Suppressed: java.lang.Exception: The above exception was originally thrown by another thread at the following location.
		at com.couchbase.client.core.msg.BaseRequest.cancel(BaseRequest.java:172)
		at com.couchbase.client.core.retry.RetryOrchestrator.lambda$maybeRetry$0(RetryOrchestrator.java:79)
		at com.couchbase.client.core.retry.RetryOrchestrator$$Lambda$408.00000000642C6680.accept(Unknown Source)
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:771)
		at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(CompletableFuture.java:789)
		at java.util.concurrent.CompletableFuture.whenComplete(CompletableFuture.java:2151)
		at com.couchbase.client.core.retry.RetryOrchestrator.maybeRetry(RetryOrchestrator.java:62)
		at com.couchbase.client.core.io.netty.chunk.ChunkedMessageHandler.channelInactive(ChunkedMessageHandler.java:238)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
		at com.couchbase.client.core.deps.io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelInactive(CombinedChannelDuplexHandler.java:418)
		at com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:389)
		at com.couchbase.client.core.deps.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354)
		at com.couchbase.client.core.deps.io.netty.handler.codec.http.HttpClientCodec$Decoder.channelInactive(HttpClientCodec.java:311)
		at com.couchbase.client.core.deps.io.netty.channel.CombinedChannelDuplexHandler.channelInactive(CombinedChannelDuplexHandler.java:221)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241)
		at com.couchbase.client.core.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248)
		at com.couchbase.client.core.deps.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901)
		at com.couchbase.client.core.deps.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:819)
		at com.couchbase.client.core.deps.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
		at com.couchbase.client.core.deps.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
		at com.couchbase.client.core.deps.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
		at com.couchbase.client.core.deps.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
		at com.couchbase.client.core.deps.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
		at com.couchbase.client.core.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		at java.lang.Thread.run(Thread.java:811)

This could indicate that I need to adjust some timing parameters or similar… How can I best troubleshoot this further and are there any obvious changes required that I did not come across by this “unplanned” update?

@jda the SDK should just work fine actually. What happened in your case I can see from the exception that while the request was in-flight (on the network), the socket got closed the the request got cancelled. From the logs you should be able to tell around the same time why/how the socket got closed.

Did you see this only once during the upgrade or do you see that constantly?

Hi @daschl

I see this several times a day (3 times over the last 9-10 hours).

I have also had some issues with time sync. (chrony/NTP) on the database servers. This issue has most likely been there all the time but started reporting with the update to 7.0 - not sure if this could impact the above issue? But then I should have seen the crash issue more often.

I just had the issue again and looked in the Couchbase logs and found this:

Client-side error-report for user "dbadmin" on node 'ns_1@db1.aqua.dtu.dk':
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36
Got unhandled javascript error:
name: Error;
message: [ngRepeat:dupes] Duplicates in a repeater are not allowed. Use 'track by' expression to specify unique keys. Repeater: label in vm.labels | filter: vm.mnSearchValue track by label, Duplicate key: sg_access_x1, Duplicate value: sg_access_x1
https://errors.angularjs.org/1.8.0/ngRepeat/dupes?p0=label%20in%20vm.labels%20%7C%20filter%3A%20vm.mnSearchValue%20track%20by%20label&p1=sg_access_x1&p2=sg_access_x1;
stack: Error: [ngRepeat:dupes] Duplicates in a repeater are not allowed. Use 'track by' expression to specify unique keys. Repeater: label in vm.labels | filter: vm.mnSearchValue track by label, Duplicate key: sg_access_x1, Duplicate value: sg_access_x1
https://errors.angularjs.org/1.8.0/ngRepeat/dupes?p0=label%20in%20vm.labels%20%7C%20filter%3A%20vm.mnSearchValue%20track%20by%20label&p1=sg_access_x1&p2=sg_access_x1
at http://db1.aqua.dtu.dk:8091/ui/web_modules/common/index-729b989c.js:6:824
at http://db1.aqua.dtu.dk:8091/ui/web_modules/common/index-729b989c.js:6:169305
at http://db1.aqua.dtu.dk:8091/ui/web_modules/common/index-729b989c.js:6:99984
at c.$digest (http://db1.aqua.dtu.dk:8091/ui/web_modules/common/index-729b989c.js:6:100669)
at c.$apply (http://db1.aqua.dtu.dk:8091/ui/web_modules/common/index-729b989c.js:6:102535)
at http://db1.aqua.dtu.dk:8091/ui/web_modules/common/index-729b989c.js:6:65922
at E (http://db1.aqua.dtu.dk:8091/ui/web_modules/common/index-729b989c.js:6:68826)
at XMLHttpRequest.g.onload [as __zone_symbol__ON_PROPERTYload] (http://db1.aqua.dtu.dk:8091/ui/web_modules/common/index-729b989c.js:6:68189)
at XMLHttpRequest.wrapFn (http://db1.aqua.dtu.dk:8091/ui/libs/zone.js:1279:43)
at ZoneDelegate.invokeTask (http://db1.aqua.dtu.dk:8091/ui/libs/zone.js:431:35);

Not sure what this is all about. It seems to have something to do with the sync.gateway based on the index name…???

Looking at the indexes it seems that exactly that index exists twice…??? That should not be possible as far as I know…

What could have caused this? And what is the best way to resolve it? Remove the sg_... indexes and restart sync.gateway to recreate them?

Ok, I stopped the sync.gateway service, I deleted all sg_.... indexes and restarted the sync.gateway service. Indexes are being recreated now - and only one of each. Will monitor this to see if this resolves the issue…

thanks @jda for keeping us up to date. I don’t know what could’ve causes an index to be created twice, this is weird (or maybe a glitch in the UI?)

@jda also at the time of the issue it makes sense to check the query error log since it might explain why the socket got disconnected. From the top of my head I don’t know an issue that could cause it in the SDK, but also going to 3.2.0 should allow you to play it safe.

Thanks @daschl. Where do I find the query logs?

Good to know that I should have a look at SDK 3.2.0 - and test it as a possible solution.

I have seen the issue a couple of times again since re-creating the SG indexes…

Where do I find the “Query error logs”?

@jda on your server (I assume linux?) they should be under /opt/couchbase/var/lib/couchbase/logs

Thanks. Will look… (yes, I’m on CentOS 7.9)

@jda also does your application log give more information about the underlying connections from the SDK side? Since the socket is closed mid-request flight, I assume there must be plenty of warnings and reconnects that might also help pin it down a bit.

I have just looked at the query.log and this does not seem “alarming” to me:

grep '2021-08-04T15:09'  query.log
2021-08-04T15:09:04.892+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 3ee03b6b-52e8-4ae4-a0c0-8e24bc0d3249 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722419151746
2021-08-04T15:09:04.988+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(3ee03b6b-52e8-4ae4-a0c0-8e24bc0d3249) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722419151746 ...
2021-08-04T15:09:05.761+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] c66dc2d9-7f34-4122-800d-cd6ee25786e3 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722035951353
2021-08-04T15:09:05.993+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(c66dc2d9-7f34-4122-800d-cd6ee25786e3) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722035951353 ...
2021-08-04T15:09:09.982+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 686af0f5-3e7b-4c54-b213-feb08db8352c new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722203456516
2021-08-04T15:09:10.056+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 904c680b-7586-4f74-ba16-b2b2271dedd5 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722047977363
2021-08-04T15:09:10.258+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(686af0f5-3e7b-4c54-b213-feb08db8352c) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722203456516 ...
2021-08-04T15:09:10.306+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(904c680b-7586-4f74-ba16-b2b2271dedd5) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722047977363 ...
2021-08-04T15:09:14.532+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 51a761ec-5357-4085-b76c-53b575c1b3ca new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722207117270
2021-08-04T15:09:14.708+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(51a761ec-5357-4085-b76c-53b575c1b3ca) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722207117270 ...
2021-08-04T15:09:20.802+02:00 [Info] [Queryport-connpool:db1.aqua.dtu.dk:9101] active conns 0, free conns 7
2021-08-04T15:09:21.771+02:00 [Info] [Queryport-connpool:db2.aqua.dtu.dk:9101] active conns 0, free conns 5
2021-08-04T15:09:28.426+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 713a8afb-752c-414f-8c13-619ecadc1625 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722200903485
2021-08-04T15:09:28.427+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(713a8afb-752c-414f-8c13-619ecadc1625) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722200903485 ...
2021-08-04T15:09:28.549+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 1021f270-a716-4dba-a17c-dfd65209c55a new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722647174520
2021-08-04T15:09:28.608+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(1021f270-a716-4dba-a17c-dfd65209c55a) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722647174520 ...
2021-08-04T15:09:31.361+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 49e18772-8018-41d4-8b5a-baeae2621adb new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722254428279
2021-08-04T15:09:31.643+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(49e18772-8018-41d4-8b5a-baeae2621adb) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722254428279 ...
2021-08-04T15:09:31.744+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] e3662f39-d9d9-4700-91e5-1cbdffbb40e2 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722124252522
2021-08-04T15:09:31.838+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 85d099e5-5350-44f4-8b8a-904e6ba7b331 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722335982529
2021-08-04T15:09:32.080+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 27767003-94dc-4ddd-9910-b86aebedc700 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722180909100
2021-08-04T15:09:32.082+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] ff2257e8-fec3-4605-9b5b-a8a1429ec0f9 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722655525787
2021-08-04T15:09:32.084+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(ff2257e8-fec3-4605-9b5b-a8a1429ec0f9) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722655525787 ...
2021-08-04T15:09:32.134+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] a538392d-c4f6-48af-b7f3-0a590487490f new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722546528830
2021-08-04T15:09:32.230+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(e3662f39-d9d9-4700-91e5-1cbdffbb40e2) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722124252522 ...
2021-08-04T15:09:32.347+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(85d099e5-5350-44f4-8b8a-904e6ba7b331) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722335982529 ...
2021-08-04T15:09:32.523+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] f4ccff44-4922-40d0-b1b9-08149c6bf679 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722481628805
2021-08-04T15:09:32.573+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 256e8b8e-de0c-4617-b915-1ee6944fa974 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722993479200
2021-08-04T15:09:32.796+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(27767003-94dc-4ddd-9910-b86aebedc700) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722180909100 ...
2021-08-04T15:09:32.815+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 6ff2f8a8-a4fd-4e2e-8e38-c2e8181b6aa2 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722404676351
2021-08-04T15:09:32.816+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(6ff2f8a8-a4fd-4e2e-8e38-c2e8181b6aa2) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722404676351 ...
2021-08-04T15:09:32.864+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(a538392d-c4f6-48af-b7f3-0a590487490f) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722546528830 ...
2021-08-04T15:09:33.358+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(f4ccff44-4922-40d0-b1b9-08149c6bf679) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722481628805 ...
2021-08-04T15:09:33.414+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(256e8b8e-de0c-4617-b915-1ee6944fa974) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722993479200 ...
2021-08-04T15:09:47.536+02:00 [Info] connected with 2 indexers
2021-08-04T15:09:47.536+02:00 [Info] client stats current counts: current: 48, not current: 0
2021-08-04T15:09:47.545+02:00 [Info] num concurrent scans {0}
2021-08-04T15:09:47.545+02:00 [Info] average scan response {88 ms}
2021-08-04T15:09:47.603+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] logstats "data" {"gsi_scan_count":85430,"gsi_scan_duration":4731193719515,"gsi_throttle_duration":138215740517,"gsi_prime_duration":967969115875,"gsi_blocked_duration":3021574898545,"gsi_total_temp_files":12226}
2021-08-04T15:09:48.674+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 742dc12b-0181-4283-b9f2-f7f95c24c20a new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722519653458
2021-08-04T15:09:48.675+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(742dc12b-0181-4283-b9f2-f7f95c24c20a) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722519653458 ...
2021-08-04T15:09:56.537+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] 110e706f-7b6d-454b-9dec-3dcc4df20523 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results1722045972873
2021-08-04T15:09:56.538+02:00 [Info] GSIC[default/data-_default-_default-1627733128966886879] request(110e706f-7b6d-454b-9dec-3dcc4df20523) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results1722045972873 ...

Is this the right place I look?

The application server is on the same network as the two database servers. The IP addresses have been set in the local hosts file on the servers - so no DNS lookup or whatsoever playing with us. There are no indications of any “real” connection issues.

The time issues had been solved prior to the timeouts that I saw yesterday. The OS also had new updates applied (this is where the Couchbase server update also sneaked in) - so there could be other changes as well.

@jda ok so can you share the full application log file which includes the SDK logs? Of course you can share them via direct message if you don’t feel comfortable linking them in this public forum - and I can take a look.

edit: also did you upgrade to 3.2.0 and can you share the code on how you have configured the SDK (if not default configs)?

@daschl just a quick word about our setup:

We have two database servers in cluster. When all of this was set up I expected to be able to have failover if one node stopped. Subsequently, I have found that I need at least two active nodes for the cluster to keep working - so right now we just have a failure if one nodes stops for whatever reason.

Next to these we have one sync.gateway server running against the cluster. Originally we considered adding one more node - but left that out from the beginning to minimise complexity - so only one SG.

And then we have the application server running IBM Domino XPages (variant of JSF) with underlying logic in Java using the Java SDK. The Domino server has not been updated recently so the Java version we are using is 1.8.

The two queries that timed out yesterday were very simple:

SELECT *
FROM data
WHERE type='WebInfo'
    AND ((inactive IS NOT VALUED
            OR NOT inactive)
        AND (infotype='9'))

Took 40ms to run manually now.

And:

SELECT *
FROM data
WHERE type='Association'
    AND (unid='546F2600F0210439C12585450030B71E')

which took <8ms to run now.

But one of the other queries reporting a timeout is more complex:

SELECT t2.*,
       t1.*
FROM data AS t1 LEFT
    JOIN data AS t2 ON 'Catch:Private:'||t1.`key` = META(t2).id
    AND t2.type='Private'
WHERE t1.type='Catch'
    AND (t1.visibility='2')

which took 338ms when I ran it manually.

From the stack traces I can see that the timeouts are reported from both of the database servers…

Question: Can I clear the “logs” directory to get rid of the logs from the last 2-3 years?

@jda one thing that would be interesting to understand if you can make that work: if you write a standalone java program which just runs those queries at a representative interval and i.e. dump the timings and watch the logs – it will help us to understand if the issue is indeed caused by something inside the application server or outside.

I’m not sure how to do the log cleaning safely, but I’ll ask.

Also, can you share the logs as well as the configuration code as mentioned before?

@daschl I had missed your message about logs and code :slightly_smiling_face:

I’m not sure that writing a separate program helps. I have 39 DAO classes - and each of them has at least 3-4 (and some up to 8-10) different queries - and on top of the save and delete operations. So just writing some code with the above queries would be very random… The problem probably arose prior to those queries being called.

A better option may be to get the needed information from the “real” queries.

I’m not sure that you can use our internal application log for anything? That is entirely controlled by what we decide to write to it. But I may be misunderstanding you on this?

Let me show you some of the Java code. This is the code that will return the “WebInfos” from above:

	private WebInfo loadFromDocument(JsonObject node) {
		// Create a new object and load it with data
		WebInfo webinfo = new WebInfo();
		webinfo.setKey(node.getString(FN_KEY));
		webinfo.setTitle(node.getString(FN_TITLE));
		webinfo.setTitleEN(node.getString(FN_TITLE_EN));
		webinfo.setTitleDE(node.getString(FN_TITLE_DE));
		webinfo.setSummary(node.getString(FN_SUMMARY));
		webinfo.setSummaryEN(node.getString(FN_SUMMARY_EN));
		webinfo.setSummaryDE(node.getString(FN_SUMMARY_DE));
		webinfo.setHtml(node.getString(FN_HTML));
		webinfo.setHtmlEN(node.getString(FN_HTML_EN));
		webinfo.setHtmlDE(node.getString(FN_HTML_DE));
		webinfo.setImageUrl(node.getString(FN_IMAGE_URL));
		webinfo.setLink(node.getString(FN_LINK));
		webinfo.setSpeciesKey(node.getString(FN_SPECIES_KEY));
		webinfo.setDate(toDate(node.getString(FN_DATE)));
		webinfo.setInfoTypeValue(node.getString(FN_INFO_TYPE));
		webinfo.setInActive(toBoolean(node.getBoolean(FN_INACTIVE)));
		readRevisionInfo(node, webinfo);
		return webinfo;
	}

	public Map<String, WebInfo> getAllWebInfos() {
		return getAllWebInfos((String) null);
	}

	public Map<String, WebInfo> getAllWebInfos(boolean activeOnly) {
		String condition = (activeOnly ? ONLY_ACTIVE_CONDITION : null);
		return getAllWebInfos(condition);
	}

	public Map<String, WebInfo> getAllWebInfos(WebInfo.InfoType type) {
		String condition = (null == type ? null : FN_INFO_TYPE + "='" + type.getValue() + "'");
		return getAllWebInfos(condition);
	}

	public Map<String, WebInfo> getAllWebInfos(WebInfo.InfoType type, boolean activeOnly) {
		String condition = (activeOnly ? ONLY_ACTIVE_CONDITION : null);
		if (null != type) {
			condition = (StringUtil.isNotEmpty(condition) ? condition + " and " : "") + "(" + FN_INFO_TYPE + "='" + type.getValue() + "')";
		}
		return getAllWebInfos(condition);
	}

	private Map<String, WebInfo> getAllWebInfos(String condition) {
		Map<String, WebInfo> all = new ConcurrentHashMap<String, WebInfo>();
		for (JsonObject row : getAllFromDb(FORM_NAME, condition).rowsAsObject()) {
			WebInfo webinfo = loadFromDocument(row.getObject(DataBean.CB_DATA_STORE));
			all.put(webinfo.getKey(), webinfo);
		}
		Util.trace("Found " + all.size() + " rows");
		return all;
	}

The getAllFromDb() method is from a base class:

	protected QueryResult getAllFromDb(String formName, String condition) {
		String q = "SELECT * FROM " + DataBean.CB_DATA_STORE + " WHERE " + FN_TYPE + "='" + formName + "'" + (StringUtil.isEmpty(condition) ? "" : " AND (" + condition + ")");
		Util.trace(q);
		QueryResult result = getCluster().query(q, getQueryParams());
		//Util.trace(result.toString());
		return result;
	}

The getCluster() method gets the object from an application scoped bean:

	private Collection db = null;
	private Cluster cluster = null;
	private final String timeoutOpts = "timeout.kvTimeout=60s";

	// Couchbase DB methods
	private Collection getDefaultCollection(Cluster clust) {
		Collection coll = null;
		if (null != clust) {
			Util.trace("Get handle to Couchbase DB/collection");
			Bucket bucket = clust.bucket(CB_DATA_STORE);
			coll = bucket.defaultCollection();
			Util.trace("Got handle to default collection");
		}
		return coll;
	}

	public Cluster getCluster() {
		if (null == cluster) {
			cluster = Cluster.connect(ConfigurationBean.get().getDatabaseServerNames() + "?" + timeoutOpts, CB_DATA_USER, CB_DATA_PASSWORD);
			Util.info("Got handle to cluster: " + ConfigurationBean.get().getDatabaseServerNames());
			db = getDefaultCollection(cluster);
		}
		return cluster;
	}

The method getDatabaseServers() just returns the string: "db1.aqua.dtu.dk,db2.aqua.dtu.dk"

Finally, the getQueryParams() from the getAllFromDb() method is:

	protected QueryOptions getQueryParams() {
		if (!Util.isServiceContext()) {
			if (SecurityBean.get().isForceDbRead()) {
				SecurityBean.get().cancelForceDbRead(); // Reset to optimize for performance again...
				Util.trace("Force re-read");
				return QueryOptions.queryOptions().scanConsistency(QueryScanConsistency.REQUEST_PLUS);
			}
		}
		return QueryOptions.queryOptions();
	}

I basically use that to force a re-read in some case where a document has been created - and needs to be read right after. This is required in some cases e.g. in the user creation process - or at least I have not been able to find another way :slight_smile:

Which of the log files from the Couchbase servers do you want me to send to you?

Does the SDK itself write to some log files? The way the application works is with a “simple” setup where the JAR files are dumped to the jvm/lib/ext to make them available to my code (above). So no pooling, caching, logging, etc. from a “framework”.

@jda yes the SDK writes to the log files as well, and this is what I’ve been looking for. It uses SLF4J automatically and picks up whatever logger config you have in place. That should be set to INFO level at least so you can see whats going on inside the sdk to debug.

Note that this here:

				return QueryOptions.queryOptions().scanConsistency(QueryScanConsistency.REQUEST_PLUS);

maybe that could be contributing? When you execute a query from the UI it is not using REQUEST_PLUS, so those queries might take longer depending on the mutation workload you have.

Hmmmm… The Domino server is a little special when it comes to Java. All code/design is also contained in the NSF (Notes database). So you don’t have a “normal” Java deployment environment. As far as I know there is no default logger - but I’ll check if it can be configured.

Is there a better way to force a “correct” read after an update has been performed? I had the issue some time back and ideally just wanted to force the re-scan relevant for that single save. But the only way I found was to raise a flag and let that hit the next read… This approach feels wrong - but I couldn’t find another way that worked. But the SDK has been updated since so perhaps there is a better way…? If not, then I suppose I could improve my “mechanism” to include the type of information and only add it to the query parameters on the next read of that type…

Also note that I really use QueryScanConsistency.REQUEST_PLUS sparsely - so absolutely not on every save - only when the application depends on having the latest updates available…

This functionality has been used for at least a couple of years - and we have not seen these issues before. I had to re-do some of the logic when I installed SDK 3.1 (e.g. when getting the collection etc.) - but that is still a while back. The system should not really be heavily used at the moment (it’s an “angler’s app” and not highest season right now). But perhaps some timings etc. have changed with version 7.0 that is now tipping us over the top?

@jda re “rather the servers were automatically updated”
Couchbase can’t automatically update itself. An update has to be triggered.
Based on my experience, it’s someone (human or automation) patching the OS - if there is a couchbase repository on the system and the patching was done like yum -y update.
You can verify the time of update in the package manager logs, most probably it’s /var/log/yum.log if you’re using a RHEL/CentOS system. Just open this file and look for the latest couchbase entry. Then you can co-relate the time to auth system log entries and find out who was patching.