About ElasticSearch Plugin

Hi & Merry Christmas :smile:

I just want some of my document’s fields index but not analyze by default in ES. I know that it should be done in couchbase_template.json but I’m not sure how as I am a newbie in ES. Appreciate all comments.

Afshin

Hi Afshin,

Let’s make sure I understand what you want to accomplish:
You want to have specific fields in your document indexed so that you can search them, but not analyzed as full text. You want only those fields searchable, and not any others. Correct?

To do this, we need to combine two features of ElasticSearch mappings:

  1. We need to disable dynamic mappings so that only the fields you specify appear in the mapping and any additional fields are ignored. This is done by setting the dynamic property on the root doc object to false.
  2. We need to configure the indexing for each of the specified fields appropriately. This is done by setting the index property on each of the mapped fields to one of the following values: “no”, “analyzed”, or “not_analyzed”; In this case, “no” means the field is neither analyzed, nor stored in the index - so it won’t be searcheable; “analyzed” is the default and means the string will be analyzed by the default analyzer and stored in the index, and “not_analyzed” means it will be indexed and searchable, but not analyzed as full text.

So let’s say that, for example, we have a document with the following fields and specifications:

  • date: indexed as date format
  • name: indexed but not as full text
  • tweet: indexed as full text in english
  • user_id: not indexed at all

We want only those fields, and if our document contains other fields, or if the schema ever changes to include new fields, we don’t want them indexed by ElasticSearch.

We’ll need to customize the template (based on couchbase_template.json that comes with the plugin) to something like this:

{
    "template" : "*",
    "order" : 10,
    "mappings" : {
        "couchbaseCheckpoint" : {
            "_source" : {
                "includes" : ["doc.*"]
            },
            "dynamic_templates": [
                {
                    "store_no_index": {
                        "match": "*",
                        "mapping": {
                            "store" : "no",
                            "index" : "no",
                            "include_in_all" : false
                        }
                    }
                }
            ]
        },
        "couchbaseDocument" : {
            "_source" : {
                "includes" : ["meta.*"]
            },
            "properties" : {
               "meta" : {
                  "type" : "object",
                  "include_in_all" : false
                },
               "doc": {
                  "dynamic": false,
                  "properties": {
                     "date": {
                        "type": "date",
                        "format": "dateOptionalTime"
                     },
                     "name": {
                        "type": "string",
                        "index": "not_analyzed"
                     },
                     "tweet": {
                        "type": "string",
                        "analyzer": "english"
                     },
                     "user_id": {
                        "type": "long",
                        "index": "no"
                     }
                  }
               }
            }
        }
    }
}

Note that we replaced the _default_ type with the specific couchbaseDocument type here, which is the default type that the plugin gives to all documents replicated from Couchbase.

Hope this helps, and please let me know if you need to dive deeper into this.

Hi David,

Thanks for the useful information provided here. I was able to update CouchBase template following the example here.

I have a requirement.
(1) In Couch Base we have different types of documents(Different JSON structure )
(2) I want to use Elastic search to query few attributes.
(3) In Couch base all documents do not store these attributes
(4) So Replicating(XDCR) all the document in Elastic Search is irrelevant
(5) Is there any way we can filter irrelevant document before putting them into Elastic Search using plugin.

Thanks a lot
Sandeep

Hi Sandeep,

Could you please provide the solution for the same, if you have found any.

Thanks,
Umang