Timeouts at view query after a few days of runtime


#1

Hi,
when I start application, it runs ok for a few days. After a few days (from 2 to 8) it stucks with following error:

java.lang.RuntimeException: java.util.concurrent.TimeoutException
at com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:71) ~[java-client-2.3.1.jar:?]
at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:610) ~[java-client-2.3.1.jar:?]
at com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:567) ~[java-client-2.3.1.jar:?]
at com.dao.couchbase.util.QueryUtils.executeViewQuery(QueryUtils.java:294) ~[dao-1.5.16.jar:1.5.16]

Frequency of view requests is stable during the last time: about 1 request in 5 seconds.

CB server version is 3.0.3. I used CB client 2.2.7 and 2.3.1.
If two equal applications are launched at one time and one of them gets timeouts, second one is still working for about a hour.

Timeouts are thrown on a view query with param.
After first timeout, application cannot perform any request to that view without timeout.
After application restart it will run ok for a few days again.
Timeout is set to 5 seconds.
Also in netstat i see that most connections are in ESTABLISHED state, but 4 are in TIME_WAIT and one in CLOSE_WAIT.

UPD: when application stucks, CPU usage on CB server nodes raises to ~80%. Once application is killed, CPU usage becames to normal 10-15%. Looks like it’s some kind of deadlock?

Execute view query method:

public static void executeViewQuery(Bucket bucket, String designDocumentName, String viewName, ViewParams viewParams,
Stale stale, ViewResultMapper resultMapper) {
if (null == bucket) {
throw new IllegalArgumentException(“Bucket cannot be null”);
}
if (null == designDocumentName || designDocumentName.trim().isEmpty()) {
throw new IllegalArgumentException(“DesignDocument name cannot be null or empty”);
}
if (null == viewName || viewName.trim().isEmpty()) {
throw new IllegalArgumentException(“View name cannot be null or empty”);
}
if (null == stale) {
throw new IllegalArgumentException(“Stale cannot be null”);
}
if (null == resultMapper) {
throw new IllegalArgumentException(“Mapper cannot be null”);
}

    ViewQuery query = ViewQuery.from(designDocumentName, viewName);
    if (null != viewParams) {
        if (null != viewParams.getKeys() && !viewParams.getKeys().isEmpty()) {
            query.keys(JsonArray.from(viewParams.getKeys().toArray()));
        }
        if (null != viewParams.getMinValue()) {
            if (viewParams.getMinValue() instanceof Long) {
                query.startKey(((Long) viewParams.getMinValue()));
            } else if (viewParams.getMinValue() instanceof Integer) {
                query.startKey((Integer) viewParams.getMinValue());
            } else if (viewParams.getMinValue() instanceof String) {
                if (!((String) viewParams.getMinValue()).isEmpty()) {
                    query.startKey((String) viewParams.getMinValue());
                }
            } else {
                query.startKey(viewParams.getMinValue().toString());
            }
        }
        if (null != viewParams.getMaxValue()) {
            if (viewParams.getMaxValue() instanceof Long) {
                query.endKey(((Long) viewParams.getMaxValue()));
            } else if (viewParams.getMaxValue() instanceof Integer) {
                query.endKey((Integer) viewParams.getMaxValue());
            } else if (viewParams.getMaxValue() instanceof String) {
                if (!((String) viewParams.getMaxValue()).isEmpty()) {
                    query.endKey((String) viewParams.getMaxValue());
                }
            } else {
                query.endKey(viewParams.getMaxValue().toString());
            }
        }
        query.inclusiveEnd(viewParams.isIncludeMaxValue());
    } else {
        query.inclusiveEnd(true);
    }
    query.stale(stale);
    //here I get timeout:
    **ViewResult viewResult = bucket.query(query);**
    if (!viewResult.success()) {
        String msg = viewResult.error().toString();
        logger.error("View-query exception occured {}.", msg);
        throw new RuntimeException("View-query exception: " + msg);
    }
    Iterator<ViewRow> iterator = viewResult.rows();
    while (iterator.hasNext()) {
        resultMapper.extract(iterator.next());
    }
}

#2

If you still have the netstat output, can you check only the connections to port 8092 (the view port)?
I don’t think the connection states are worrying, but worth a check limited to the view port.

Did you collect the logs on the server side by any chance when the timeouts started occurring, see if something is degrading there?

When you say timeouts are thrown on a view query with params, does it mean that queries without params do not timeout?

The CPU raising to 80% is interesting… Would you be able to reproduce the issue in an environment where you can get a thread dump during timeouts? Even better if you can throw in some sort of packet tracing.
If you’re running in a non-production environment on an Oracle JVM, you can also look at JFR (Java Flight Recorder)…

(PS: 2-8 days to reproduce is going to be hard :persevere:)


#3

Application performs querying with and without params. When application stucks, both queries are timing out.

I observed server logs but didn’t see something wrong. Maybe you can suggest what should I look for?

Here’s results of netstat and profiling: View_Timeout.zip (277.7 KB)