TimeOutException on bucket.get()

ayvid_51081 · August 3, 2016, 7:53pm

For a simple bucket .get () , we are get TimeOutExceptions.

j ava.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:75) ~[java-client-2.2.5.jar:na]
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:129) ~[java-client-2.2.5.jar:na]
at com.couchbase.client.java.CouchbaseBucket.get(CouchbaseBucket.java:124) ~[java-client-2.2.5.jar:na]

We increased the kvTimeout to 10 secs.
CouchbaseEnvironment couchbaseEnvironment = DefaultCouchbaseEnvironment
.builder()
.autoreleaseAfter(couchbaseAutoReleaseAfter)
.kvTimeout(TIMEOUT_10_SEC)
.build();

We are still getting Timeouts.

ingenthr · August 3, 2016, 9:05pm

What can you say about the underlying environment? Are the app servers and the cluster in the same LAN? Are any of them VMs? In particular, are any on cloud instances like EC2 micro instances?

Another recommendation is that you can increase the log level if you’d like to see details as things progress. That may help you identify the issue.

ayvid_51081 · August 4, 2016, 2:50am

Timeouts are happening on the production environment.
Where should the logs be increased?

simonbasle · August 4, 2016, 8:58am

Can you reproduce the problem in isolation and capture a detailed SDK log of it?
See http://developer.couchbase.com/documentation/server/4.5/sdk/java/collecting-information-and-logging.html

You’d have to configure logging level to be the most detailed, TRACE in Log4J for instance.

ayvid_51081 · August 13, 2016, 12:41pm

Since it is on the production environment, we dont want to increase the log level. Thinking off on a tangent,
we are using
bucket
.async()
.upsert
.flatmap
.toList()
.toBlocking
.single

Does toBlocking cause the time out?
I tried removing the toBlocking , and used a forEach(the documenatation says forEach is blocking). But i see that the performance is higher with forEach.(tested on loacl env) Can i substitute forEach instead of toBlocking?Am i missing something?

simonbasle · August 13, 2016, 1:27pm

ForEach in this case is non blocking and asynchronous, meaning that the rest of your code will continue executing immediately…
What is the flatmap used for?

ayvid_51081 · August 13, 2016, 1:41pm

bucket.async().
from(keyMap)
. flatmap(upsert )
.flatmap(update N1Ql quey)
.flatmap(upsert)
.groupBY()
.forEach()

When i removed toBlocking (), i had seen the the record were not upserted in Couchbase(checked in Couchbase in 2 mins after the code ran). That was expected, because it was async. When i added the groupBy and forEach, i could see the reocrds are inserted imediately.I thought it was because the forEach was blocking.
So is using forEach better than toBlocking? Does forEach have any other catches?

ayvid_51081 · August 15, 2016, 1:17pm

The groupBy seems to group all the records, so does that mean all my keys from the map are processed by couchbase by the time the forEach is done ?

I need to have the resultSet whether the upsert was a sucess or failure store it and use it later to store the result in Couhbase.

simonbasle · August 16, 2016, 8:29am

I don’t think groupBy is used correctly / doing what you think it’s doing here. It’ll create an Observable of GroupedObservables according to a criterion that produces a “key” for each group.
You then need to subscribe to all of these GroupedObservable, on which you can call getKey() to see which group the represent.
The caveat is that GroupedObservable can cache values, so you need to subscribe to all of them even if you’re not interested in a particular group.

However, they will emit values as they come, won’t wait for all values to be emitted before emitting something.

This short example demonstrates the use of groupBy (which, again, I don’t think you need):

Observable.interval(0, 1, TimeUnit.SECONDS)
        .take(10)
        .groupBy(i -> i % 2)
        .flatMap(group -> group.getKey() == 0L ? group : group.ignoreElements())
        .subscribe(System.out::println,
            e -> e.printStackTrace(), () -> System.out.println("DONE"));

Thread.sleep(10000);

This produces two groups: even numbers and odd numbers. It then subscribes to the two groups (using flatMap) but ignores the “odd” group’s elements. Prints out:

0 //immediately
2 //after 2 seconds (1 is emitted after 1sec but ignored in the flatMap)
4
6
8 //after 8 seconds...
// 9 is emitted after 9s, ignored, and then all the streams complete
DONE

Note that forEach is just an alias for the more commonly used subscribe!

ayvid_51081 · August 21, 2016, 6:22pm

Trying to reproduce the problem and capture the logs in the lower environments.

We use logback . Using “TRACE” level for the class that is sets the DefaultCouchbaseEnvironment should be good?

Do you also want to add runtimeMetricsCollectorConfig ,
Also the .defaultMetricsLoggingConsumer(false, CouchbaseLogLevel.TRACE).

simonbasle · August 22, 2016, 2:38pm

Yes, TRACE level is good. Using .defaultMetricsLoggingConsumer(true, CouchbaseLogLevel.TRACE) on the environment could also dump useful information.
Hopefully you can reproduce your original issue in 1 quick run. Don’t forget to attach the code you used for testing

ayvid_51081 · August 23, 2016, 1:20pm

1 )We have the Network metrics and Run Time metrics printed every one hour. In one the clusters , we see “UpsertRequests” only for 3 nodes out of the 4 CoubaseServer nodes.In the other 2 clusters, we see “UpsertRequests” for 4 nodes. is that something to be looked at?

2)We see this log often. Is it something to be looked at?
08:59:40.664 [cb-core-3-2] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Could not fetch config from bucket “bucket-name " against “server-name”/162.140.107.13”.
com.couchbase.client.core.RequestCancelledException: Could not dispatch request, cancelling instead of retrying.
at com.couchbase.client.core.retry.RetryHelper.retryOrCancel(RetryHelper.java:51) [core-io-1.2.5.jar:na]
at com.couchbase.client.core.RequestHandler.onEvent(RequestHandler.java:206) [core-io-1.2.5.jar:na]
at com.couchbase.client.core.RequestHandler.onEvent(RequestHandler.java:76) [core-io-1.2.5.jar:na]
at com.couchbase.client.deps.com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:128) [core-io-1.2.5.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157) [na:1.7.0]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:627) [na:1.7.0]
at com.couchbase.client.deps.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) [core-io-1.2.5.jar:na]
at java.lang.Thread.run(Thread.java:801) [na:1.7.0]
08:59:40.664 [putations-50] DEBUG c.c.c.c.config.ConfigurationProvider - New Bucket “bucket-name” config proposed.
08:59:40.665 [putations-50] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Completed refreshing config for bucket “bucket-name”

simonbasle · August 24, 2016, 7:49am

08:59:40.664 [cb-core-3-2] DEBUG c.c.c.c.c.refresher.CarrierRefresher - Could not fetch config from bucket "bucket-name " against "server-name"/162.140.107.13". com.couchbase.client.core.RequestCancelledException: Could not dispatch request, cancelling instead of retrying.

I think you should look deeper into that particular server you don’t see requests for in your 4 node setup. I bet it is the same one as the server mentioned in the exception above, and that there is something wrong with it.

Have you had a look at that particular server’s logs?

ayvid_51081 · August 24, 2016, 1:03pm

In the metrics I do see request going to that server/node though.Where can i post the logs

ayvid_51081 · August 24, 2016, 1:07pm

What is the exact log are we looking for that server?

simonbasle · August 24, 2016, 1:39pm

Could you post an example of metrics dump here? (if you want, edit out the IPs/hostnames by replacing them each with its own fake hostname)

simonbasle · August 24, 2016, 1:40pm

I’m not sure which logs exactly would be best to look at… maybe @ingenthr or @daschl have an idea?

ayvid_51081 · August 24, 2016, 3:15pm

hostname3/555.555.555.13:11210":{“BINARY”:{“TouchRequest”:{“SUCCESS”:{“metrics”:{“min”:362,“max”:364,“count”:1,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:364,“90.0”:364,“95.0”:364,“99.0”:364,“99.9”:364}}}},“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:268,“max”:827,“count”:61,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:440,“90.0”:548,“95.0”:573,“99.0”:577,“99.9”:827}}}},“GetBucketConfigRequest”:{“SUCCESS”:{“metrics”:{“min”:344,“max”:2932,“count”:119,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:561,“90.0”:675,“95.0”:720,“99.0”:1531,“99.9”:2932}}}}}},“hostname1/555.555.555.11:11210”:{“BINARY”:{“TouchRequest”:{“SUCCESS”:{“metrics”:{“min”:466,“max”:468,“count”:1,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:468,“90.0”:468,“95.0”:468,“99.0”:468,“99.9”:468}}}},“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:278,“max”:1179,“count”:60,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:438,“90.0”:536,“95.0”:1028,“99.0”:1146,“99.9”:1179}}}},“GetBucketConfigRequest”:{“SUCCESS”:{“metrics”:{“min”:380,“max”:3588,“count”:116,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:569,“90.0”:679,“95.0”:733,“99.0”:3162,“99.9”:3588}}}}}},“event”:{“name”:“NetworkLatencyMetrics”,“type”:“METRIC”},“hostname3/555.555.555.13:8093”:{“QUERY”:{“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:378,“max”:2113,“count”:119,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:557,“90.0”:675,“95.0”:790,“99.0”:1466,“99.9”:2113}}}}}},“hostname3/555.555.555.13:8092”:{“VIEW”:{“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:724,“max”:1785,“count”:120,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:1003,“90.0”:1155,“95.0”:1245,“99.0”:1564,“99.9”:1785}}}}}},“hostname1/555.555.555.11:8092”:{“VIEW”:{“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:765,“max”:1654,“count”:120,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:987,“90.0”:1163,“95.0”:1179,“99.0”:1253,“99.9”:1654}}}}}},“hostname2/555.555.555.12:11210”:{“BINARY”:{“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:230,“max”:1318,“count”:119,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:399,“90.0”:481,“95.0”:507,“99.0”:643,“99.9”:1318}}}},“ObserveRequest”:{“SUCCESS”:{“metrics”:{“min”:305,“max”:307,“count”:1,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:307,“90.0”:307,“95.0”:307,“99.0”:307,“99.9”:307}}}},“UpsertRequest”:{“SUCCESS”:{“metrics”:{“min”:483,“max”:933,“count”:2,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:485,“90.0”:933,“95.0”:933,“99.0”:933,“99.9”:933}}}},“GetRequest”:{“SUCCESS”:{“metrics”:{“min”:415,“max”:417,“count”:1,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:417,“90.0”:417,“95.0”:417,“99.0”:417,“99.9”:417}}}}}},“hostname0/555.555.555.10:8092”:{“VIEW”:{“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:679,“max”:1818,“count”:120,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:962,“90.0”:1073,“95.0”:1114,“99.0”:1802,“99.9”:1818}}}}}},“hostname2/555.555.555.12:8093”:{“QUERY”:{“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:372,“max”:2162,“count”:119,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:548,“90.0”:626,“95.0”:696,“99.0”:888,“99.9”:2162}}}}}},“hostname0/555.555.555.10:11210”:{“BINARY”:{“ReplaceRequest”:{“SUCCESS”:{“metrics”:{“min”:311,“max”:380,“count”:2,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:313,“90.0”:380,“95.0”:380,“99.0”:380,“99.9”:380}}}},“KeepAliveRequest”:{“SUCCESS”:{“metrics”:{“min”:203,“max”:1368,“count”:60,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:411,“90.0”:485,“95.0”:532,“99.0”:634,“99.9”:1368}}}},“ObserveRequest”:{“SUCCESS”:{“metrics”:{“min”:282,“max”:284,“count”:1,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:284,“90.0”:284,“95.0”:284,“99.0”:284,“99.9”:284}}}},“UpsertRequest”:{“SUCCESS”:{“metrics”:{“min”:389,“max”:557,“count”:3,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:518,“90.0”:557,“95.0”:557,“99.0”:557,“99.9”:557}}}},“GetRequest”:{“SUCCESS”:{“metrics”:{“min”:236,“max”:514,“count”:6,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:309,“90.0”:444,“95.0”:514,“99.0”:514,“99.9”:514}}}},“GetBucketConfigRequest”:{“SUCCESS”:{“metrics”:{“min”:323,“max”:5406,“count”:125,“timeUnit”:“MICROSECONDS”,“percentiles”:{“50.0”:532,“90.0”:663,“95.0”:720,“99.0”:3620,“99.9”:5406}}}}}}}

We have this kind of log printed every one hour.