N1QL mutuation best practice

socketman2016 · April 14, 2019, 6:06am

I have some situation , I need to DELETE and UPDATE millions of documents , What is the best way to do it? is N1QL good for this?

I have 2 kind of document, I need to join to get results , When I have too many read and few update , I think it is better to denormalize data instead of join , than means , If I have 2 kind of document User and Message and each user has User.image , When I need to fetch messages , I need User.image too , I think it is better to denormalize data and I copy User.image in Message.image too
But when user change the image , I need to update millions of Messages

johan_larson · April 15, 2019, 12:00pm

The idea of embedding a separate image in each message, when all messages from a single user have the same image, seems strange to me. Well-indexed joins aren’t expensive. Embed the document id of the user’s image in the user document, and you’ll only pay for one extra fetch (for the image) per message.

If you need to be more efficient than that, you should probably try to cache the images outside the database.

socketman2016 · April 15, 2019, 1:28pm

@johan_larson What about DELETE , when a user deleted , I need to delete all associated messages , and some users has million messages
What is the best way to delete?

johan_larson · April 15, 2019, 1:41pm

Assuming your message documents look something like this:

{
    msg_id: "JSDLFKJDSLFK",
    user_id: "john.smith234",
    type: "message",
   ...
}

When deleting everything about user John Smith, you run this SQL command:

DELETE FROM myData WHERE user_id = "john.smith234" AND type = "message"

To run this efficiently, you’ll need an index like this:

CREATE INDEX message_user_id_idx ON myData(user_id) WHERE type = "message"

socketman2016 · April 16, 2019, 4:37am

@johan_larson , I talk about DELETEing millions document with a signle query and with a optimized INDEX

If I create CREATE INDEX message_user_id_idx ON myData(user_id) WHERE type = "message" and then run DELETE FROM myData WHERE user_id = "john.smith234" AND type = "message" when John Smith has few messages , there is no issue , but when it has millions it take long time , more than 15 minutes

I need a faster soloution , less than a minute

johan_larson · April 16, 2019, 12:45pm

That delete statement indexed as I described is as fast as N1QL is going to get with that data model. To get faster, you need a data model that has fewer document to delete. You might try batching up messages into compound documents like this:

{
   "user_id" : "john.smith234",
   "first_msg_timestamp" : ....,
   "last_msg_timestamp": ...., 
   "messages" : [
       {...}, {...},  ...],
   "type" : "message_batch"
}