Got memcached EINVAL! Aborted (core dumped)


#1

Hi, I get this error randomly causing my client to crash, the code just updates a document (get => replace) and if relaunched it eventually finishes successfully.

I’ve tried to set LCB_LOGLEVEL to 5 but nothing useful seems to come the logs, any ideas?

636ms [I0] {29455} [DEBUG] (lcbio_mgr - L:416) <10.131.12.180:11210> (HE=0x20a28a0) Creating new connection because none are available in the pool 636ms [I0] {29455} [DEBUG] (lcbio_mgr - L:321) <10.131.12.180:11210> (HE=0x20a28a0) Starting connection on I=0x20db700 636ms [I0] {29455} [INFO] (connection - L:450) <10.131.12.180:11210> (SOCK=0x2145440) Starting. Timeout=1000000000us 636ms [I0] {29455} [INFO] (lcbio_mgr - L:407) <10.131.13.134:11210> (HE=0x208efd0) Found ready connection in pool. Reusing socket and not creating new connection 636ms [I0] {29455} [DEBUG] (lcbio_mgr - L:416) <10.131.14.54:11210> (HE=0x2051d10) Creating new connection because none are available in the pool 636ms [I0] {29455} [DEBUG] (lcbio_mgr - L:321) <10.131.14.54:11210> (HE=0x2051d10) Starting connection on I=0x208b910 636ms [I0] {29455} [INFO] (connection - L:450) <10.131.14.54:11210> (SOCK=0x2035170) Starting. Timeout=1000000000us 636ms [I0] {29455} [DEBUG] (lcbio_mgr - L:223) <10.131.13.134:11210> (HE=0x208efd0) Assigning R=0x21466e0 SOCKET=0x2094730 636ms [I0] {29455} [DEBUG] (ioctx - L:101) <10.131.13.134:11210> (CTX=0x20352e0,unknown) Pairing with SOCK=0x2094730 636ms [I0] {29455} [DEBUG] (server - L:543) <10.131.13.134:11210> (SRV=0x2052de0,IX=1) Setting initial timeout=999856ms 637ms [I0] {29455} [TRACE] (connection - L:344) <10.131.14.54:11210> (SOCK=0x2035170) Received completion handler. Status=0. errno=0 637ms [I0] {29455} [INFO] (connection - L:116) <10.131.14.54:11210> (SOCK=0x2035170) Connected 637ms [I0] {29455} [WARN] (lcbio - L:103) FIXME: Unknown iops/os error code 95. Using NETWORK_ERROR 637ms [I0] {29455} [INFO] (connection - L:121) <10.131.14.54:11210> (SOCK=0x2035170) Couldn't set TCP_NODELAY 637ms [I0] {29455} [DEBUG] (lcbio_mgr - L:271) <10.131.14.54:11210> (HE=0x2051d10) Received result for I=0x208b910,C=0x2035170; E=0x0 637ms [I0] {29455} [DEBUG] (lcbio_mgr - L:223) <10.131.14.54:11210> (HE=0x2051d10) Assigning R=0x208b710 SOCKET=0x2035170 637ms [I0] {29455} [TRACE] (server - L:522) <10.131.14.54:11210> (SRV=0x20b8ff0) Session not yet negotiated. Negotiating 637ms [I0] {29455} [DEBUG] (ioctx - L:101) <10.131.14.54:11210> (CTX=0x20e27b0,unknown) Pairing with SOCK=0x2035170 637ms [I0] {29455} [TRACE] (connection - L:344) <10.131.12.180:11210> (SOCK=0x2145440) Received completion handler. Status=0. errno=0 637ms [I0] {29455} [INFO] (connection - L:116) <10.131.12.180:11210> (SOCK=0x2145440) Connected 637ms [I0] {29455} [WARN] (lcbio - L:103) FIXME: Unknown iops/os error code 95. Using NETWORK_ERROR 637ms [I0] {29455} [INFO] (connection - L:121) <10.131.12.180:11210> (SOCK=0x2145440) Couldn't set TCP_NODELAY 637ms [I0] {29455} [DEBUG] (lcbio_mgr - L:271) <10.131.12.180:11210> (HE=0x20a28a0) Received result for I=0x20db700,C=0x2145440; E=0x0 637ms [I0] {29455} [DEBUG] (lcbio_mgr - L:223) <10.131.12.180:11210> (HE=0x20a28a0) Assigning R=0x208ae90 SOCKET=0x2145440 637ms [I0] {29455} [TRACE] (server - L:522) <10.131.12.180:11210> (SRV=0x208fb60) Session not yet negotiated. Negotiating 637ms [I0] {29455} [DEBUG] (ioctx - L:101) <10.131.12.180:11210> (CTX=0x209c320,unknown) Pairing with SOCK=0x2145440 Got memcached EINVAL! Aborted (core dumped)

Thanks in advance,
Juan


#2

Which version of the server is in use and which version of the SDK? From a quick look at the code, it seems EINVAL can happen from other things internal to the client. @brett19: any advice here?


#3

Hey @juanmanuel.ventura,
Is there any chance that you would be able to inspect the core file and provide a backtrace such that we can try and narrow down the specific issue? Additionally, the specific SDK version and it’s configuration (how you installed it) would be good information to have.
Cheers, Brett


#4

Hi @brett19, @ingenthr, I’m afraid it was my fault after all, I’ve found the problematic docs to have corrupted keys (18 in a mass import of ~15 millions items) result of a bad encoding, so memcached was right on refusing to accept that string as a valid key. Nevertheless it would be nice to have the sdk to handle this kind of exceptions not crashing.

Thank you all for the help,
Juan


#5

Hey @juanmanuel.ventura,

I agree that this should definitely not cause the application to crash with a core dump. Would it be possible to get the binary representation of one of the keys which caused this issue to occur? This will allow us to build a test-case to identify the cause.

Cheers, Brett


#6

Memcached’s EINVAL essentially indicates a bug with the client (in this case, libcouchbase) as it should filter and validate the input. There is an actual abort() call and it seems to be a debugging leftover - although do note that receiving an EINVAL is quite severe.

This bug was introduced in 2.5.6 of libcouchbase, so it’s fairly recent.

https://issues.couchbase.com/browse/CCBC-689

However I would be interested in seeing what sort of input caused the EINVAL