Couchbase - elasticsearch full document


#1

from reading the documentation I understand that when querying elasticsearch the result contains only the keys to the relevant docs. Did i understand correctly?

If so -
what is the usual pattern? -> query es -> iterate results (build key list) -> bulk get from couchbase ->iterate results -> send response to client ?

if the above is true my additional question would be :

is there a way to store full docs within elasticsearch so that the query pattern will look like this -> query es -> iterate results -> send response to client. this approach eliminate the need of additional fetch from couchbase.

I want to add to the above that i do not want to write docs directly to elasticsearch and would rather use the cb-es plugin for syncing docs.


#2

The usual pattern is to store the documents in Couchbase and ES for it’s indexing and querying capabilities. The reason for this is that Couchbase does very well in managing high throughput, low latency document access through it’s managed cache. I suspect if you follow this pattern, you’ll actually see better performance as well.

Yes, there’s an “additional fetch” but it’ll to be a system that is very good at keeping the working set of requests available for quick access. Also, note that you can frequently parallelize the requests with the Couchbase SDK and use things like bulk requests.

The only argument I can think of for storing the full document in ES would be if you frequently change how you index the data. Then it may be useful to have a copy locally. We don’t often see that case.


#3

I am not sure this is a full or an accurate answer.

It seems that documents are stored in Elastic Search (_source)

Please review answer from Shay Banon (Co-Founder of Elastic Search)

It doesn’t make sense to fetch 50 docs when ES yields 50 pointers. As far as the below link suggest - the ES query can indeed return the FULL docs with no extra hops.

For the sake of the community I suggest investigating this deeper and getting back with a complete code example that shows how one can avoid using CB for the extra fetching which is redundant and in some case it is not an issue of performance but rather an issue of de-coupling ES from CB. I want to be able to query ES without even knowing about CB working behind the scene. So can someone please comment on the below?

http://www.couchbase.com/communities/q-and-a/copying-documents-couchbase-elasticsearch-indexing


#4

In the testing Couchbase has done, we get better latency and throughput fetching docs from Couchbase. That makes sense as it’s designed for consistent low latency/high throughput. Of course, each user should test their own app’s needs.