Make sync to ElasticSearch faster

Hello everyone,

I m facing to an increasing of the time to sync my data between couchbase and ES. And it’s a bit painfull when the ES mapping change and we have to sync all. So my goal is to make this sync faster.

The idea is to be able to do have 2 XDCR replication to split my data into 2 ES indices

So what I would like to do is :
Documents with key starting with a, b or c to indice 1
Others documents to indice 2.

My issue is XDCR filter regex does not support the ‘not contains’ ?! typo.

Do you have an idea how to achieve my goal?

Thanks a lot

Unfortunately, the regular expression parser used by XDCR does not support negative lookahead (?!). This has come up before. The two suggestions in that thread were to enumerate all the prefixes in the “not a, b, or c” category, or to rename all the “not a, b, or c” documents so their keys start with a “d” prefix that would be easy to filter on (which would simplify the task of enumerating them).

If those solutions are not practical, you can try using the code from this StackOverflow post to generate a suitable regular expression.

In the long term, perhaps it would be nice to have an “Invert match” checkbox. I’ve filed MB-27941 as an enhancement request.

Thanks,
David

1 Like

David is correct. XDCR filtering does not support negative lookahead.

Your use case does not require negative lookahead, though.

For documents starting with ‘a’ or ‘b’ or ‘c’, use filtering expression “^[abc]”.
For documents not starting with 'a or ‘b’ or ‘c’, use expression “^[^abc]”.

Thanks for your answers.

Updating keys is something we can’t apply. Maybe I will try to use the code from your StackOverflow post.

Else, @david.nault I join you on the idea to have an invert match checkbox. I think it’s not complicated to implement but the value is high :slight_smile:

As I mentioned, there is no need to update document keys or to use the workaround to the negative lookahead. The simple expressions in my previous post should be sufficient.