Consistent 'failed to access view' error on minor server load


#1

Please, help me fix this ‘random’ error.

We use 3.0.2-1603 Enterprise Edition (build-1603-rel) under OS: x86_64-unknown-linux-gnu, Java client 1.4.7 with JDK8.
We used Couchbase to tryout on one of the features which requires consistent CRUD operations, as a result we’re doing GET view with Stale.FALSE after every data change (to guarantee consistency), so that read operations could access views with Stale.OK.
The issue is that on ~100 unit tests, which triggers Couchbase connection, to cover our functionality, so with 95% chance one or few of random tests are failing with two types of errors:
net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node:
java.lang.RuntimeException: Failed to access the view - OperationException: GENERAL

But for any failing test it’s ok to re-run it so it would succeed. All tests are executed in one thread, generating up to 10 ops/s, with documents less than 500B.

Test machine is within local network with server instance. We tried to have client on same machine, on different machine, changed iOS and Win clients, but it still reproduced. We even tried to do the same with another server instance and another set of clients - same issue.

What logs do you need to help me fix this?
What additional data you need?


#2

some details here:
java.lang.RuntimeException: Timed out waiting for operation
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:142)
Caused by:
Timed out waiting for operation - failing node:

happens on asyncCas(PersistTo.MASTER).get()
But again, the load is like 10 ops/sec, so it’s not expected to lag this way.
Also I’m not clear why node is named ?

Second error:
java.lang.RuntimeException: Failed to access the view
Caused by: OperationException: GENERAL

here we’re doing following:
query.setStale(Stale.FALSE).setLimit(1).setReduce(false);
client.query(view, query);

while index queue have 2-5 documents less than 500B.


#3

do you see any issues under the server logs - admins console > logs?
thanks
-cihan


#4

No logs available at that moment of time (if you’re telling about /index.html#sec=log tab in web console).
Last log was 5 days ago, when I change bucket cache configuration back to value eviction.


#5

one more detail - just next test works perfectly fine, so it seems like a temporary lag.
Also during this lag any other connections to other buckets at the same server instance works perfectly fine. We didn’t noticed and disconnects of server from admin console.


#6

We are seeing the same issue in our tests. With a very nominal load (few ops per sec) we sometimes see this:
java.lang.RuntimeException: Timed out waiting for operation
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:142)

it usually happens in the set() or delete() operation in CouchbaseClient, when trying to get() the Future value.

Nothing in Couchbase Logs, running version ‘Version 3.0.1-1444-rel’.

Running couchbase locally in the same machine as the client app (a development machine).

Any ideas?

br,
jpv


#7

it seems that issue is somewhere here:
asking a lot to persist data to HDD and triggering STALE.FALSE view access.
Only one thing we can see, is that during these lags the graphs in CB node are not rendering/not showed.