TONUMBER() type-conversion function performance issues

#1

Hi there,

I’m experiencing performance problem with using TONUMBER() type-conversion function. It might be that I’m doing something wrong and maybe someone can point me to the right direction.

Here is a very typical task, I store user sessions and each session document has userid attribute in it, an example would be:

{
  "session": "jk34h51jk3h4r5kj1hrkjqwhefkjqwhefkljqh34jrh134kljhrkj3",
  "userid": "123456789",
  "ip": "127.0.0.1",
  "created": "2016-05-07 20:26:12",
  "last_updated": "2016-05-14 20:54:02",
  "expires": "2016-05-14 20:26:12"
}

To be able to query user sessions I created the only index in the default bucket:

CREATE INDEX `user-session-idx` ON `default`(userid, meta().id) USING GSI;

Now I’m able to run such queries:

select meta().id from default where userid="123456789";

With 50k documents in the database these queries run in about 6ms, all cool.

But then I noticed that if there are session documents in the database where userid attribute is a JSON number, not string, then the query would not return those documents. This was quite a catch for me, coming from MySQL, where it doesn’t matter if you query as:

select * from sessions where userid=123456789;

OR

select * from sessions where userid="123456789";

Anyway, to be on the safe side I decided to use TONUMBER() function. And the immediate problem I got was “No primary index on keyspace default” error:

cbq> select meta().id from default where TONUMBER(`userid`)=123456789;
{
    "requestID": "ea0393c6-603c-409b-9343-85518f1e0797",
    "errors": [
        {
            "code": 4000,
            "msg": "No primary index on keyspace default. Use CREATE PRIMARY INDEX to create one."
        }
    ],
    "status": "fatal",
    "metrics": {
        "elapsedTime": "5.015269ms",
        "executionTime": "4.907528ms",
        "resultCount": 0,
        "resultSize": 0,
        "errorCount": 1
    }
}

And this is my first question:

  • Why do I have to create primary index to be able to call TONUMBER() function?

After creating primary index I tried to run the same query again.

select meta().id from default where TONUMBER(`userid`)=123456789;

The query found all documents but took over 4 sec comparing to 6-7 ms when searching without type conversion. The difference is really huge!

Clearly, something is not right here.
Why am I experiencing this problem and how would you recommend me to filter by attributes which could be JSON strings and numbers at the same time?

System details:

  • OS: Debian wheezy
  • Couchbase server: couchbase-server-community-4.0.0-4051-1
#2

Hello,

Please do the following:

CREATE INDEX `user-session-idx` ON `default`(TO_NUMBER(userid), meta().id);

SELECT meta().id FROM default WHERE TO_NUMBER(userid)=123456789;

BTW, I would be surprised if MySQL treats strings and numbers as the same thing.

#3

Thank you for your reply.

MySQL performs implicit conversion for numbers to strings, and vice versa: http://dev.mysql.com/doc/refman/5.7/en/type-conversion.html

If I understood correctly Couchbase requires primary index to exist if there is no other index query could use. In that sense your suggestion helped. After I created the index you specified, the query returns result even when primary index is missing. The only problem that the result is empty, the index is used but no matches could be found. Why is it so, where can I look for the reason?

#4

Are you sure there is a match in the data? You can create a primary index, USE INDEX hints to use the primary index, and then verify that there should be a result.

#5

Just did a fresh test on the database with no data in it and no indexes.
Here is a list of steps:

  • Inserted 50k session documents with user attribute being inserted as string.
  • Created a primary index:
CREATE PRIMARY INDEX ON default;
  • Created another index as you suggested:
CREATE INDEX `user-session-idx` ON `default`(TO_NUMBER(`user`), meta().id);
  • This query returns empty set:
SELECT meta().id FROM `default` WHERE TO_NUMBER(`user`)=123456789;
  • Another query for the same user, which uses primary index finds 13 documents:
SELECT meta().id FROM `default` USE INDEX (`#primary` USING GSI) WHERE TO_NUMBER(`user`)=123456789;
  • If I compare user attribute with a string value I find the same 13 documents:
SELECT meta().id FROM `default` WHERE `user`="123456789";

Looks like a bug?

1 Like
#6

Looks like a bug. Are you able to test this on 4.5 or 4.1? If not, we will test it here.

Thanks,
Gerald

#7

I’m trying to do the test with couchbase-4.5 (4.1 is available as an Enterprise Edition only, can’t try it) and got another problem…
Here is what happened:

  • I removed couchbase-server-community-4.0 and installed couchbase-server-enterprise_4.5.0.
~# dpkg -i couchbase-server-enterprise_4.5.0-beta-debian7_amd64.deb
  • Created a default bucket with all default settings except the amount of available RAM.
  • Inserted 50k documents.
  • To my surprise I discovered that the indexes I created in the previous installation continue to exist in the current one: #primary and user-session-idx.
  • When I try to delete any of these indexes I get the following error:
cbq> DROP PRIMARY INDEX ON default;
{
    "requestID": "ceaf0db8-2c69-4334-bbf9-7bd1cd2cb56c",
    "signature": null,
    "results": [
    ],
    "errors": [
        {
            "code": 5000,
            "msg": "GSI Drop() - cause: FDB_RESULT_KEY_NOT_FOUND"
        }
    ],
    "status": "errors",
    "metrics": {
        "elapsedTime": "1.587294ms",
        "executionTime": "1.539492ms",
        "resultCount": 0,
        "resultSize": 0,
        "errorCount": 1
    }
}
  • If I try to create a new primary index I get this error:
cbq> CREATE PRIMARY INDEX ON default;
{
    "requestID": "b4bd68ae-5bc1-4038-8299-26a7d6fff862",
    "signature": null,
    "results": [
    ],
    "errors": [
        {
            "code": 5000,
            "msg": "GSI CreatePrimaryIndex() - cause: Index #primary already exists."
        }
    ],
    "status": "errors",
    "metrics": {
        "elapsedTime": "1.481408ms",
        "executionTime": "1.397952ms",
        "resultCount": 0,
        "resultSize": 0,
        "errorCount": 1
    }
}

So how do I proceed with this situation? Another bug?

#8

OK, after purging all couchbase packages and files and doing a completely fresh installation, I’m able to test with couchbase-server-4.5.0.
The problem is still there.
I repeated all steps from my this post and reproduced the problem.

#9

Hi @dmitryb,

Thanks for your help troubleshooting this. We will file a bug to investigate and fix this shortly after 4.5.

Thanks,
Gerald

#10

I have tried with upcoming 4.5 GA version and it works fine.

Another workaround is convert the number to string as suggested below.

CREATE INDEX ix1 ON `default`(`user`) USING GSI;
SELECT meta().id FROM `default` WHERE `user`= TO_STRING(123456789);

FYI, In the index meta().id is implicit, omitting from index keys the index makes leaner and query can still use meta().id with same benefits.

#11

Hi @vsr1

Does TO_NUMBER() also work with 4.5 GA?

#12

TO_NUMBER() also works with upcoming 4.5 GA

insert into default values(“v001”,{“user”:“123456789”});
CREATE INDEX `user-session-idx` ON `default`(TO_NUMBER(`user`));
SELECT meta().id FROM `default` WHERE TO_NUMBER(`user`)=123456789;

1 Like