FTS mapping strategy for dynamic fields

FTS looks like a great addition to couchabse. Are there any plans to support dynamic fields for index mapping? By that I mean where document structures are not ‘fixed’, so we would provide a mapping function that spits out the content to index based on our own criteria?

As a simple example, where I have a dynamic array of properties, where the “name” in each tuple is undetermined at design time:

{ "dynamic_property": { "dynamic_field_name_1": "value", "dynamic_field_name_2": "value" } }

I would like to provide a mapping function that covers both “dynamic” fields under one “fixed” queryable field name. Or, is there a way to cover this mapping that I overlooked?

Thanks!

1 Like

Dynamic fields are supported, but maybe not for the exact cases you have in mind. Let me know.

For the sake of example, let’s create an index and use the “default” mapping on the travel-sample data set that ships with Couchbase Server 4.5. A default mapping indexes any document without regard for the document’s type field. (You can still do what I’m showing here if you specify a type mapping, that is, if you limit the mapping to documents of type = “hotel”. I’m just trying to keep it simple).

I did one other thing that you probably don’t want to do on production, but it makes the example easier to run. I expanded the advanced options and clicked “store_dynamic” so that when the search service builds the index, it stores all of the fields so that it will return them with your search it will give you result snippets with matches highlighted.

When you query, use the _all field, which you can rename if you want. The _all field is what the server searches if you don’t specify any field scope in a query string query. So, when you go to the “index” tab in the web admin and search for “gorilla”, that is equivalent to “_all:gorilla”.

Down below is an index definition CURL you can run from your command line to create the index I just described. This is identical to creating it in the web console. This index uses the “travel-sample” data set. You can query for things like: “holiday inn victoria emelie dione” and see results being matched in multiple fields. At the end of the article, there’s a curl command that will run that query, in case you’re a command line type person.

OK, a couple more things. In your example, you showed dynamic properties in an embedded object. You can search the sample index we created (called “sir-mix-alot” because we mix up all the fields, naturally). If you search for “inn rooftop”, you will get matches on the embedded “geo” object, which looks like this in a hotel document:

"geo": { "accuracy": "ROOFTOP", "lat": 51.35785, "lon": 0.55818 },

If you want to be more specific with how you index a document, normally you need to “insert a child mapping” (that’s what it’s called in the custom mapping UI). I’ve linked a video below that explains how to create custom maps, and it’s also described in the 4.5 Beta documentation.

Hope that helps!


Index definition
curl -XPUT -H "Content-Type: application/json" \ http://localhost:8094/api/index/sir-mix-alot \ -d '{ "type": "fulltext-index", "name": "sir-mix-alot", "sourceType": "couchbase", "sourceName": "travel-sample", "planParams": { "maxPartitionsPerPIndex": 32, "numReplicas": 0, "hierarchyRules": null, "nodePlanParams": null, "pindexWeights": null, "planFrozen": false }, "params": { "mapping": { "byte_array_converter": "json", "default_analyzer": "standard", "default_datetime_parser": "dateTimeOptional", "default_field": "_all", "default_mapping": { "display_order": "0", "dynamic": true, "enabled": true }, "default_type": "_default", "index_dynamic": true, "store_dynamic": true, "type_field": "type" }, "store": { "kvStoreName": "forestdb" } }, "sourceParams": { "clusterManagerBackoffFactor": 0, "clusterManagerSleepInitMS": 0, "clusterManagerSleepMaxMS": 2000, "dataManagerBackoffFactor": 0, "dataManagerSleepInitMS": 0, "dataManagerSleepMaxMS": 2000, "feedBufferAckThreshold": 0, "feedBufferSizeBytes": 0 } }'


Sample query
curl -XPOST -H “Content-Type: application/json”
http://localhost:8094/api/index/sir-mix-alot/query
-d ‘{
“explain”: true,
“fields”: [
"*"
],
“highlight”: {},
“query”: {
“query”: “holiday inn victoria emelie dion”
}
}’


Videos about indexes that cover the very basics of custom index mapping, first is conceptual, the second is a demo.


@WillGardella
I would also need to use FTS capabilities for dynamic fields. Consider following:

{
	
	email : "email@address.tld",
	"data" : {
		"dynamic_property" : "ANY_TYPE",
		 "dynamic_property2" : {
		 	"property" : "searchableValue"
		 }
	}
}

The data field is static but the value of the filed is dynamic. What I would like to (using N1QL) is to

SELECT * FROM bucket WHERE email = '<email>'  AND data LIKE '%searchableValue%' ;

I would expect to have in a result set all the documents with data field contains substring searchableValue. Is possible with in-build FTS service? Also would be enough to create FTS index just over data filed.

Thank for help

@petojurkovic,
There are two parts to the question: (1) how to filter based on the specific email and (2) how to index and search the data field.

For (1), index the field email using the keyword analyzer, which will return the contents of the email field as one single token. I’m assuming you sanitize your input and that the email field contains only a single, well-formed email address.

Then, your search will be a compound search that always includes a term search that must match the email address. A term search isn’t analyzed, so again, you want to make sure you just pass the entire valid email as input.

For (2), I read this a few times and I think what you want is to find the string “searchableValue” no matter where it appears in an underlying structure under “data”. The dynamic mapping should work for this. In the example you gave with the LIKE, I’m not sure exactly what kind of full text search behavior you want to see, but you can play around with that part of the query to get what you want. Regular expression search doesn’t perform as well as match or match phrase, so I recommend you use those if they meet your needs. (More about FTS query types )

I created a document with ID petojurkovic and "type": "petojurkovic" in my travel-sample bucket to illustrate this (a bit lazy of me!) My REST API call for index definition and query are below. For your query, you probably want to use one of the SDKs instead of using the REST call (because the SDKs know the cluster topology, and they are probably easier to use, too).

For the index definition, to make testing easier, I turned on “Store Dynamic Fields” in the advanced index settings. This just writes a copy of the data in every field into the index, so when you search in the web admin, you see result snippets and highlighting. I also turned off dynamic indexing at the top level and only turned it on for the data field, which makes your index a little more selective / efficient.

curl -XPUT -H "Content-Type: application/json" \
 http://127.0.0.1:8094/api/index/petojurkovic \
 -d '{
  "type": "fulltext-index",
  "name": "petojurkovic",
  "uuid": "3830cac09bb9ffb4",
  "sourceType": "couchbase",
  "sourceName": "travel-sample",
  "sourceUUID": "3dd7f72189ec1a3952e2c267bc5a061d",
  "planParams": {
    "maxPartitionsPerPIndex": 32,
    "numReplicas": 0,
    "hierarchyRules": null,
    "nodePlanParams": null,
    "pindexWeights": null,
    "planFrozen": false
  },
  "params": {
    "doc_config": {
      "mode": "type_field",
      "type_field": "type"
    },
    "mapping": {
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "display_order": "1",
        "dynamic": true,
        "enabled": false
      },
      "default_type": "_default",
      "index_dynamic": true,
      "store_dynamic": true,
      "type_field": "type",
      "types": {
        "petojurkovic": {
          "display_order": "0",
          "dynamic": false,
          "enabled": true,
          "properties": {
            "data": {
              "display_order": "0",
              "dynamic": true,
              "enabled": true
            },
            "email": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "analyzer": "keyword",
                  "display_order": "0",
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "email",
                  "store": true,
                  "type": "text"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "kvStoreName": "mossStore"
    }
  },
  "sourceParams": {
    "clusterManagerBackoffFactor": 0,
    "clusterManagerSleepInitMS": 0,
    "clusterManagerSleepMaxMS": 2000,
    "dataManagerBackoffFactor": 0,
    "dataManagerSleepInitMS": 0,
    "dataManagerSleepMaxMS": 2000,
    "feedBufferAckThreshold": 0,
    "feedBufferSizeBytes": 0
  }
}' 

This is how you would do what’s described above with a query string query (easiest for me, possibly not the type of query you want).

curl -XPOST -H "Content-Type: application/json" \
 http://127.0.0.1:8094/api/index/petojurkovic/query \
 -d '{
  "explain": true,
  "fields": [
    "*"
  ],
  "highlight": {},
  "query": {
    "query": "+email:email@address.tld searchableValue"
  }
}'

Hope that helps you get started right; let me know if you have more questions. Good luck!

1 Like

Sorry, I thought I should also include my test document since it’s slightly different than what you showed above:

{
  "type": "petojurkovic",
  "email": "email@address.tld",
  "data": {
    "dynamic_property": "ANY_TYPE",
    "dynamic_property2": {
      "property": "searchableValue"
    }
  }
}

For (2), I read this a few times and I think what you want is to find the string “searchableValue” no matter where it appears in an underlying structure under “data”.

Correct.

I’m not sure exactly what kind of full text search behavior you want to see, but you can play around with that part of the query to get what you want.

It doesn’t necessary need to be “full-text search”. I just need to find all documents which in the data object have contained given string. In other words, exactly what SQL like does. This is not possible in Couchbase without creating a FTS index, if I am not mistaken.

Also for me still is not quite clear how to do it using N1QL. Because following query does retrieve the document even thought contains the substring.

SELECT * FROM bucket WHERE data LIKE '%searchableValue%' ;

It only works when the data is a text type.

{ "data" : "some searchableValue in the data field"}

Thank you!