Search by null or not nulls

Sample docs

{
	"type":"car"
	"color":null,
	"id":"1"
	"doors":0
}

{
	"type":"car"
	"color":"black"
	"id":"2",
	"doors":null
}

The color is test and doors in a integer.

  1. I want to be able to find all car with no color defined or defined.
  2. I want to able to find all doors with no door defined or defined.

@dipen_patel Searching null values is something we donā€™t support yet, but is on our road map.

Until such a time, you could use boolean queries with the knowledge of all values ā€œcolorā€ can take, and possible values for ā€œdoorsā€. Hereā€™re examples ā€¦

  • match all cars with color not black (a disjunct for all colors you donā€™t want to match)
curl -XPOST -H "Content-type:application/json" http://<username>:<password>@<ip>:8094/api/index/<index_name>/query -d '{"query": {"must": {"conjuncts": [{"match": "car", "field": "type"}]}, "must_not": {"disjuncts": [{"match": "black", "field": "color"}]}}}'
  • match all cars with color black (a disjunct for all colors you want to match)
curl -XPOST -H "Content-type:application/json" http://<username>:<password>@<ip>:8094/api/index/<index_name>/query -d '{"query": {"must": {"conjuncts": [{"match": "car", "field": "type"}, {"field": "color", "match": "black"}]}}}'
  • to match all cars with a defined set of doors (assuming that doors takes a positive value here)
curl -XPOST -H "Content-type:application/json" http://<username>:<password>@<ip>:8094/api/index/<index_name>/query -d '{"query": {"must": {"conjuncts": [{"match": "car", "field": "type"}, {"field": "doors", "min": 0, "inclusive_min": true}]}}}'
  • to match all cars where doors isnā€™t defined ā€¦
curl -XPOST -H "Content-type:application/json" http://<username>:<password>@<ip>:8094/api/index/<index_name>/query -d '{"query": {"must": {"conjuncts": [{"match": "car", "field": "type"}]}, "must_not": {"disjuncts": [{"field": "doors", "min": 0, "inclusive_min": true}]}}}'

@abhinav
any updates regarding this issue? is it available now?

Iā€™m using the safe delete strategy on the project. I have the field called deletedAt and it has a string value(datetime, means itā€™s deleted, not available) or null (means the document is not deleted, available). So, when I use FTS index to search an appropriate document I want to be sure that Iā€™m searching directly on documents where deletedAt = null.
Also, I need to be able to get facets with this condition, but I donā€™t see how to do that using REST-api, couchnode (npm package) or raw query(N1QL) :frowning:

I will be grateful for any idea

Hey @rmaidanskyi,

The null indexing/searching feature isnā€™t yet available or picked up for implementation.

Based on my understanding of your use case,
If you query for all the documents which have a deletedAt value less than the current timestamp, then you would get all the documents that got deleted until now. Now if this condition is put to a must_not clause of a boolean query you would all the non-deleted documents until the current timestamp passed in.

curl -XPOST -H "Content-Type: application/json" -u<UserName:Passwd>  http://host:port/api/index/<indexName>/query -d '{
"query": {"must_not": {"disjuncts": [{"end": "2020-12-28","inclusive_end": true,"field": "deletedAt"}]}}}'  

The problem here would be the performance of the query with heavy loads.
Let us know how it goes.

Cheers!

Hi, @sreeks
thx for your idea

I tested it but it doesnā€™t work, looks like it completely ignores null value
I need to search on nullable documents(where deletedAt = null)

Hey,

For me, this approach works.
I get all the documents which are having "updated": null in the beer-sample bucket with a sample index using the above query.
Sample docs

doc1 => {
  "deletedAt": "2010-07-22 20:00:20"
}

doc2=> {
  "deletedAt": null
}

Can you share the index definition/sample doc for checking this further?

Cheers!

Here is a document example:

{
  id: "some_UUID",
  type: "user",
  name: "John",
  age: 23,
  updatedAt: "2020-08-14T14:53:50.248Z",
  deletedAt: null,
}

I have 10 records in a basket. 8 of them has deletedAt: null, 2 of them has deletedAt: "2020-08-14T14:53:50.248Z" .

I have specified FTS index based on type field. Also, next fields are included:

  • name(type: text, analyzer: inherit)
  • age(type: number)
  • deletedAt(type: datetime, datetimeParser: inherit) // maybe the problem is here, maybe this format is not parsed with default parser???

I tried the next query:

curl -XPOST -H "Content-Type: application/json" -u<UserName:Passwd>  http://host:port/api/index/<indexName>/query -d '{
"query": {"must_not": {"disjuncts": [{"end": "2020-10-10T10:00:50.248Z","inclusive_end": true,"field": "deletedAt"}]}}}'  

Also, I tried different combinations of conjuncts, disjuncts, must, must_not - all of them returns all 10 records from the bucket.

could you please share a query for the beer bucket? and could you try to search against FTS?
thx

@rmaidanskyi,

Can you share the whole index definition json copied from the index listing page? (there are options to copy your index defn there). Your query is working for me with the same sample document and a mock index defn as yours.

And in meantime, if you are a licensed user then please go ahead and create a support ticket for this so that we can debug this further.

Sure

here are examples:

{
      "age": 40,
      "deletedAt": "2021-01-04T10:00:00.000Z",
      "id": "07d6548c-f5a0-4def-827d-e4734e03ac67",
      "name": "Jack",
      "text": "I am too old to play football",
      "type": "user",
      "updatedAt": "2020-08-14T14:53:50.248Z"
    }, {
      "age": 23,
      "deletedAt": null,
      "id": "7a251009-d181-4944-81f7-b0573e4261ae",
      "name": "John",
      "text": "I like to play football",
      "type": "user",
      "updatedAt": "2020-08-14T14:53:50.248Z"
    }, {
      "age": 58,
      "deletedAt": null,
      "id": "7a251009-d181-4944-81f7-b0573e4261ae",
      "name": "Jason",
      "text": "I like chess",
      "type": "user",
      "updatedAt": "2020-08-14T14:53:50.248Z"
    }

here is an index definition

curl -XPUT -H "Content-Type: application/json" \
-u <username>:<password> http://localhost:8094/api/index/fts-user -d \
'{
  "type": "fulltext-index",
  "name": "fts-user",
  "uuid": "",
  "sourceType": "couchbase",
  "sourceName": "users",
  "sourceUUID": "",
  "planParams": {
    "maxPartitionsPerPIndex": 171,
    "indexPartitions": 6
  },
  "params": {
    "doc_config": {
      "docid_prefix_delim": "",
      "docid_regexp": "",
      "mode": "type_field",
      "type_field": "type"
    },
    "mapping": {
      "analysis": {},
      "default_analyzer": "standard",
      "default_datetime_parser": "dateTimeOptional",
      "default_field": "_all",
      "default_mapping": {
        "dynamic": true,
        "enabled": false
      },
      "default_type": "_default",
      "docvalues_dynamic": true,
      "index_dynamic": true,
      "store_dynamic": false,
      "type_field": "_type",
      "types": {
        "user": {
          "dynamic": false,
          "enabled": true,
          "properties": {
            "age": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "age",
                  "type": "number"
                }
              ]
            },
            "deletedAt": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "deletedAt",
                  "type": "datetime"
                }
              ]
            },
            "text": {
              "dynamic": false,
              "enabled": true,
              "fields": [
                {
                  "docvalues": true,
                  "include_in_all": true,
                  "include_term_vectors": true,
                  "index": true,
                  "name": "text",
                  "type": "text"
                }
              ]
            }
          }
        }
      }
    },
    "store": {
      "indexType": "scorch"
    }
  },
  "sourceParams": {}
}'

I need to be able to search nullable docs using the query similar to this:

SELECT * FROM `Users` AS x USE INDEX (`fts-user` USING FTS)
WHERE SEARCH(x, {"query":{
        'conjunction': [{
            "query": "football" 
       },{
            'field': 'deletedAt',
            'match': null
        }]
}});

@rmaidanskyi The FTS service will not index fields that have nil values.

But seeing that youā€™re using N1QL to do your FTS searches, N1QL does offer a way that you can leverage.
For your index definition and type of documents, rewrite your query to the following and give it a go ā€¦

SELECT * FROM `Users` AS x
WHERE x.type="user"
AND SEARCH(x, "football")
AND x.deletedAt IS NULL;

thx @abhinav
but it is not that what Iā€™m looking for
I need to be able to get facets. To do that I have to use REST-api request or couchnode npm package (for Node.js), because I didnā€™t find a way how to get facets using N1QL .

So, to correctly get facets on my documents I need to be able to search on null (using REST or couchnode)!!!

I see that the structure of the second argument for SEARCH function is the same as for REST-api, so, I guess if we have a working example in N1QL then it should work in REST as well, thatā€™s why Iā€™ve shown an example in N1QL because after that I gonna use this part {"query": {CONDITIONS_HERE}} in couchnode or REST

I see. If youā€™re referring to using facetted searches using Couchbaseā€™s FTS, we donā€™t quite support it from within N1QL yet (it is on the roadmap).

Since within FTS we donā€™t support indexing NULL values (yet), I donā€™t see a way for you to achieve your results using just the FTS service.

Youā€™d need to either implement that logic within your application or try FTS in combination with N1QL like I pointed out earlier. You will be able to put in multiple conditions within ā€œqueryā€ but the NULL check will need to be outside the SEARCH function ā€¦

SELECT * FROM `Users` AS x
WHERE x.type = "user"
AND SEARCH(x, {"query": {"conjuncts": [{..}, {..}]}})
AND x.deletedAt is NULL;

But, again - facetted FTS queries arenā€™t supported from within N1QL yet. weā€™ll plan for it for an upcoming release.
Iā€™ve set up this ticket for it - https://issues.couchbase.com/browse/MB-43445

1 Like

@abhinav thank you, it is a great news
it will be awesome

hey @abhinav, @rmaidanskyi,

There is a way for this to work at least in this case as specified earlier.

The same index definition and the docs search works fine for me.
But since he has the deletedAt dates set like 4th 2021, his query should also have the latest (query time) timestamp (current time as I mentioned in the beginning comment) Or an advanced enough date for this to work.

My query and response for the same documents are as below,

curl -XPOST -H "Content-Type: application/json" -uAdministrator:asdasd  http://localhost:9200/api/index/fts-user/query -
d '{"fields":["age", "text"], "query": {"must_not": {"disjuncts": 
[{"end": "2021-01-05T10:00:50.248Z","inclusive_end":true,"field": "deletedAt"}]}}}'|jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   866  100   721  100   145   352k  72500 --:--:-- --:--:-- --:--:--  422k
{
  "status": {
    "total": 6,
    "failed": 0,
    "successful": 6
  },
  "request": {
    "query": {
      "must_not": {
        "disjuncts": [
          {
            "start": "0001-01-01T00:00:00Z",
            "end": "2021-01-05T10:00:50Z",
            "inclusive_end": true,
            "field": "deletedAt"
          }
        ],
        "min": 0
      }
    },
    "size": 10,
    "from": 0,
    "highlight": null,
    "fields": [
      "age",
      "text"
    ],
    "facets": null,
    "explain": false,
    "sort": [
      "-_score"
    ],
    "includeLocations": false,
    "search_after": null,
    "search_before": null
  },
  "hits": [
    {
      "index": "fts-user_37f249825a7265a7_aa574717",
      "id": "user6",
      "score": 1,
      "sort": [
        "_score"
      ],
      "fields": {
        "age": 23,
        "text": "I like to play football"
      }
    },
    {
      "index": "fts-user_37f249825a7265a7_f4e0a48a",
      "id": "user7",
      "score": 1,
      "sort": [
        "_score"
      ],
      "fields": {
        "age": 58,
        "text": "I like chess"
      }
    }
  ],
  "total_hits": 2,
  "max_score": 1,
  "took": 145595,
  "facets": null
}

For this to work, (which is already met in this scenario)
All the documents in the bucket need to be of type ā€˜userā€™ and all the documents should have either a null/valid value for the ā€˜deletedAtā€™ field.

1 Like

wow! it works!
I tested it with a different scenario and looks like it is exactly what I need
@sreeks thank you very much

@sreeks @abhinav
One more question
I have added a bunch of documents with another type to this bucket.
And now I have two types of documents id Users bucket (the structure of the document doesnā€™t matter,
but the fieldsā€™ names should not match with the names of fields in User doc).
Currently, I have 10 documents in the bucket. FTS index is the same, I didnā€™t change, did not rebuild, or update it.
Now Iā€™m searching against the FTS index with next query:

curl -X POST -H 'Content-Type: application/json' \
-u USERNAME:PASSWORD \
http://localhost:YOUR_PORT/api/index/fts-user/query -d \
'{
  "query": {
    "must_not": {
      "disjuncts": [{
        "end": "2021-01-06T10:00:50.248Z", // current date or tomorrow date
        "inclusive_end":true,
        "field": "deletedAt"
      }]
    }
  }
}' | jq

In other words, I want to get only docs with deletedAt: null.

And I see an interesting thing in response:

...
"total_hits": 10,
...

but Iā€™m searching against an FTS indexā€¦ Why my query was matched against all docs in the bucket? Did I miss something?
And of course it returns all records (ids, score, etc.) in "hits": [{id}, {id}, ...]
But the facets section in the response body is built correctly, probably because documents have different fields (field names).

Is it a bug?
Can I be sure that Iā€™ll get facets built based only on fields from User documents?

@rmaidanskyi.

I had already mentioned the limitation of this approach in the previous comment.

For this to work, (which is already met in this scenario)
All the documents in the bucket need to be of type ā€˜userā€™ and 
all the documents should have either a null/valid value for the ā€˜deletedAtā€™ field. 
1 Like

@rmaidanskyi,
As mentioned earlier, this approach may not scale well with many millions and billions of data. But if your data set size is limited for this query requirements, then you may put them into a specific bucket for this workaround.

With the upcoming release of Couchbase Server, you will have the flexibility of putting data into smaller granular Collections and creating FTS indexes on them to enable this without worrying about exceeding the limits on the number of buckets in the system.

Cheers!

@sreeks thx
itā€™ll be very helpful