EnsureIPEndPointsAreLoaded() is not thread-safe

Hi,

We recently upgraded our Couchbase .NET client library to version 2.7.18. Since then we started seeing the following exception

    System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
   at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
   at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
   at System.Linq.Enumerable.Any[TSource](IEnumerable`1 source)
   at Couchbase.Configuration.Server.Serialization.VBucketServerMap.EnsureIPEndPointsAreLoaded() in C:\Jenkins\workspace\dotnet\sdk\couchbase-net-client-scripted-build-pipeline\couchbase-net-client\Src\Couchbase\Configuration\Server\Serialization\VBucketServerMap.cs:line 77
   at Couchbase.Configuration.Server.Serialization.VBucketServerMap.get_IPEndPoints() in C:\Jenkins\workspace\dotnet\sdk\couchbase-net-client-scripted-build-pipeline\couchbase-net-client\Src\Couchbase\Configuration\Server\Serialization\VBucketServerMap.cs:line 44
   at Couchbase.Core.VBucket.LocatePrimary() in C:\Jenkins\workspace\dotnet\sdk\couchbase-net-client-scripted-build-pipeline\couchbase-net-client\Src\Couchbase\Core\VBucket.cs:line 40
   at Couchbase.Core.Buckets.CouchbaseRequestExecuter.SendWithRetry[T](IOperation`1 operation) in C:\Jenkins\workspace\dotnet\sdk\couchbase-net-client-scripted-build-pipeline\couchbase-net-client\Src\Couchbase\Core\Buckets\CouchbaseRequestExecuter.cs:line 657
   at Couchbase.CouchbaseBucket.Get[T](String key, TimeSpan timeout) in C:\Jenkins\workspace\dotnet\sdk\couchbase-net-client-scripted-build-pipeline\couchbase-net-client\Src\Couchbase\CouchbaseBucket.cs:line 978

In EnsureIPEndPointsAreLoaded() function

internal void EnsureIPEndPointsAreLoaded()
        {
            if (_ipEndPoints == null || !_ipEndPoints.Any())
            {
                lock (_syncObj)
                {
                    if (_ipEndPoints == null || !_ipEndPoints.Any())
                    {
                        _ipEndPoints = new List<IPEndPoint>();
                        foreach (var server in ServerList)
                        {
                            _ipEndPoints.Add(IPEndPointExtensions.GetEndPoint(server));
                        }
                    }
                }
            }
        }

The lock is acquired after Any() is called.
This could cause 1 thread to change the list _ipEndPoints while another thread is executing Any(). Causing the exception Collection was modified; enumeration operation may not execute.

To avoid lock contention, in version 2.7.12 the lock was moved after the null and Any() check.

But this has made the function, not thread-safe.

Would greatly appreciate it if this bug could be fixed!

1 Like

_ipEndPoints = ServerList.Select(server => IPEndPointExtensions.GetEndPoint(server)).ToList();

may be worth considering to ensure that the Add calls are occurring prior to the assignment to _ipEndPoints .

The check outside the lock is actually very intentional for performance reasons. This avoids every single call taking out a lock and causing lock contention between threads.

However, you are both correct that it does have some semantic bugs. It should use .Count instead of .Any() (prior to .NET 5 this is more performant anyway), which would be thread-safe. Also, it shouldn’t assign the shared variable until after the Add calls are complete.

I’ve filed an issue to track: https://issues.couchbase.com/browse/NCBC-2607

https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1?view=netcore-3.1#thread-safety states Any instance members are not guaranteed to be thread safe.

So while Count may usually work, it isn’t guaranteed to work (since it is explicitly documented as not being safe). https://stackoverflow.com/questions/1352360/safe-to-get-count-value-from-generic-collection-without-locking-the-collection.

You can reasonably safely do the null check outside of the lock (standard double checked lock pattern). And if you use my earlier _ipEndPoints = ServerList.Select(server => IPEndPointExtensions.GetEndPoint(server)).ToList(); suggestion then you can remove the need for the Any / Count check outside the lock (since _ipEndPoints will only be assigned the data once it is populated).

That might look like:

    {
        if (_ipEndPoints == null)
        {
            lock (_syncObj)
            {
                if (_ipEndPoints == null || !_ipEndPoints.Any()) // or use Count == 0
                {
                    _ipEndPoints = ServerList.Select(server => IPEndPointExtensions.GetEndPoint(server)).ToList();
                }
            }
        }
    }

I could be wrong. Just an idea.

That would also work. Though Count is thread-safe so long as you don’t rely on it being 100% accurate (i.e. checking it again inside the lock). It’s just returning a raw integer, which is a small enough type to avoid tearing.

It is safe in terms of returning something in terms of the current implementations? Yes. Is it guaranteed to do that in future? No. It is possible to get quite odd results from Count (e.g. negative) - albeit difficult. :wink:

Either way, the code I suggested gets rid of the need to do an Any or Count on a risky list (i.e. one being written to in a different thread - i.e. outside the lock). So may be worth considering (and once you are inside the lock your suggested Count optimisation seems quite prudent).

Either way, Pranavi and I look forward to the fix. If there is any further assistance we can give, please let us know.

Is there a rough ETA on when this fix will be available on NuGet?

@Matthew_Wills -

Should be up in the next couple of days - 2.7.20.

-Jeff

1 Like