CBES 4.0 Bulk Request Write Request "Refresh Policy" Configuration

Hi there,

We’re currently using CBES 4.0 and ES 6.4.2 and would like to get an idea if there’s any way we can configure the connector to do “synchronous” writes via:

The “IMMEDIATE” refresh policy upon the bulk request. Please see documentation here:
https://artifacts.elastic.co/javadoc/org/elasticsearch/elasticsearch/6.4.2/org/elasticsearch/action/support/WriteRequest.RefreshPolicy.html

Is there any other way to enforce this other than by write policy? The behavior I’d like is to have the search immediately available after a bulk index.

Thanks,
Alex

Hi Alex,

As appealing as immediate mode may appear, I’m not sure it makes sense in the context of the connector. For one thing, it has poor performance characteristics. According to the Javadoc for IMMEDIATE (emphasis added):

Force a refresh as part of this request. This refresh policy does not scale for high indexing or search throughput but is useful to present a consistent view to for indices with very low traffic. And it is wonderful for tests!

Also, from the ES tuning guide:

The operation that consists of making changes visible to search - called a refresh - is costly, and calling it often while there is ongoing indexing activity can hurt indexing speed.

I can’t find a reference at the moment, but the last time we looked into this I recall reading that too-frequent refreshing can also have a negative impact on the internal structure of the index.

So that’s the performance angle – now let’s consider what immediate writes would actually do for you.
Another strike against immediate writes is that the replication from Couchbase to ES is asynchronous. Consequently, the time between when a document is written to Couchbase and when the document is indexed in ES is unpredictable. Even if the connector were to immediately refresh the index after every bulk write, there’s no guarantee about when the Couchbase document would be visible in ES, since external code doesn’t know when the connector indexed the document.

TL;DR Immediate writes are not as awesome as they may seem. In most cases, tuning the global refresh interval or forcing an ad-hoc refresh is a better option.

Please let me know if there’s an aspect to this problem we haven’t considered, or a specific use case where immediate mode does make sense.

Thanks,
David