GSI on parsed document key

#1

Hello!

I have a question about building GSI on document keys (with some key transformation), will it be more efficient than building GSI on document fields? And what is the right way to do so via N1QL?

I have documents of 2 types in my bucket:

  1. user

with keys like
user:userId
for example
user:1234

and json data like
{
“docType”: “user”,
“userId”: 1234,
“userattr1”: “abc”,

}

  1. user device

with keys like
device:userId:deviceType:deviceId
for example
device:1234:MOB:12345-12345-12345

and json data like
{
“docType”: “device”,
“userId”: 1234,
“deviceId”: “12345-12345-12345”,
“deviceType”: “MOB”,
“deviceattr1”: “abc”,

}

And I have the following queries:

  1. get user by key
  2. get device by key
  3. get devices by user id

Document updates happen more often than document creates.
There is no problem with query by key performance, but when we got about 50 mln documents in bucket, MR view updates became too slow, because we require strong view query consistency.

So I need to query devices by user id and I want to try N1QL indexes instead of MR views.

I’ve built an index like that
CREATE INDEX idx_user_doc ON mybucket(userId, docType)

and use query like
SELECT * FROM mybucket WHERE userId = 12345 AND docType = ‘device’

But for my purpose, I don’t really need to use json fields, all info that I need is in the document key (user id).
I wonder maybe it would be more efficient to build index on just document key, so that Couchbase doesn’t need to parse json data and do that on each document attribute update, so the index will be updated only when new document is created or when document is deleted - not on each document attribute update (these won’t change the key).

What is the right way to build index in this case?
Can I parse meta.id somehow when creating index so that I can extract userId from device document key and build index on that?
Will it be more efficient than building index on json attributes?
Yeah and again, I’d like to have consistent reads when it comes to query devices by user id, but its ok if json itself is a bit outdated, what I don’t want is to miss document itself (so key consistency is required, but data consistency is not)

I have a feeling that maybe I need to use primary index instead, but how to parse keys and would it be efficient?

Edit: I have found SPLIT function so that I can split key by ":’, I guess, but still wonder if I should do it at all, if its GSI and not primary index.

Thanks in advance,
Cho

#2

You could use it.
Example:

CREATE INDEX idx_user_doc ON mybucket(SPLIT(META().id,":")[1]) WHERE SPLIT(META().id,":")[0] = "device";
SELECT RAW META(t).id FROM mybucket  AS t WHERE SPLIT(META(t).id,":")[0] = "device" AND   SPLIT(META(t).id,":")[1]  = "12345";
   SELECT RAW t FROM mybucket  AS t WHERE SPLIT(META(t).id,":")[0] = "device" AND   SPLIT(META(t).id,":")[1]  = "12345";

Remember userId is through document key is string.
If you need whole document then project document key and use SDKs to fetch document asynchronously. or Don’t try to access any field in the document, If you do that the expressions on document key can be over head then document fields options might be better.
UPDATE always need whole document. If that is majority of queries document fields might be better. You can also consider SDKs subdoc API to update.
Don’t create primary index

You can experiment and measure the performance and see which one performs better

https://blog.couchbase.com/create-right-index-get-right-performance/

#3

Thanks a lot, it works!
I’ll run more tests when the indexing of production-size test data set is finished.
Cheers!