Full-Text Search on a small subset of a data bucket


#1

Hi all, we are testing out Couchbase 5 full-text search. I’ve set up the type mapping to index documents of a certain type A, and choosen a few specific fields to index. However, I noticed that the search index is indexing all documents of typeA. Is there a way to only index a subset of all A documents within a bucket?


#2

Hello @famanson,

The default mapping with index all documents, a new mapping is required. Here is example using the beer-sample, where there are two types on documents beer and brewery.

Going to create a new search index on the description field in the beer documents only.

Remember to disable the default mapping.

Next is to add a field you want to index, in this case description:

Now to test it by searching for “outdoor beer garden” which is in the description for one of the breweries, so no results should turn up:

I hope that helps.


#3

Hey,

Yes, I got that part. However, FTS indexer still goes through every single doc in the bucket right? What we want to do is to only index items with doc_type=Document and has an attribute significant=True on them. Is that possible?


#4

I don’t think it’s currently possible to filter base on two fields, @steve/@mschoch is that correct?

One workaround is to include the significant field in the index itself and then at query time filter on significant=True.

This does mean the index will include all the documents with doc_type=Document.


#5

@pvarley sorry for the late reply

This does mean the index will include all the documents with doc_type=Document

Well, as I said above, this is exactly what we don’t want… It seems to be a very common use case though? Or am I missing something here?


#6

Hey there, I’m trying to achieve the same right now - how would the filtering at query time work? Thanks


#7

Hi – If your index mapping only defines a type mapping where doc type=Document, only that subset of docs whose type=Document will contribute to the index. (Underneath the hood, the FTS engine will examine every doc, but the resulting index will represent that subset of docs whose type=Document.)

But, FTS does not have a feature to also have indexing-time filtering on some other field value, like your significant=True example.

Hope that makes sense (and, yes, also keeping ears open to see how many folks have this kind of need)


#8

Hi @noob1337

As per my understanding, the query time filtering work around in the above comment means that,

The user may always index that extra field(in above case it was Significant) as a part of the indexed document and during query time, he can always pass that extra constraint like “significant=True” along with other search parameters to filter only those documents during querying which has this Significant field that has a value True.

thanks,
Sreekanth