Analytic remote link randomly disconnecting and reconnecting

Hi to all members,
We have on dev environment, a Couchbase Cluster of three nodes with data on it and two separate nodes of Couchbase for analytics. The remote link for analytics created for the data cluster from analytics nodes is randomly disconnecting and reconnecting. The firewall between nodes allow any TCP port connection. At the moment of link and data set creation the data is replicated without any trouble, after some time the remote link stats disconnecting and reconnecting.
May be somebody faced such problems too, would appreciate any advice…

Hi @ion.z,
The issue might be data related if one of your created Analytics collections has a filter that expects a json field to be of a certain json type (e.g. an object) but some documents coming from the Data Service have that field as a different type (e.g an array). You can check the analytics_info.log file to see if an error message related to that is logged.
What version of CB Server are you using? Would you mind sharing the DDLs you used to create the Analytics collections?

Hi @mhubail,
Thank you for fast reply.
Yes, you are right there was a type mismatch error (expected value of type object, but got the value of type string). I’ve removed the filter from dataset and the link isn’t disconnecting any more. It would be nice if I knew which document it doesn’t like because in the log file analytics_info.log I can’t find such information. Do you know if there are logs where I could find the name of the document which is causing such errors?

Hi @ion.z
You can try running the following function which would retrieve details about the record that failed to be ingested.
SELECT * FROM dcp_ingestion_failure_report() AS x;

Availability of this function would depend on the CB Server version you are using. As @mhubail stated, providing the CB Server version would help in providing a more accurate solution.

Hi @Hussain.Towaileb ,
Thank you for your reply, I will try your select. Sorry, that was my mistake, the version of Couchbase is 6.6.1.

@ion.z,
The query @Hussain.Towaileb provided will give you those document ids but only for the documents that were encountered before the link got disconnected. If you would like to know all such documents, you can use the new dataset without filters that you created and execute a query like this:

SELECT META().id
FROM full_dataset d
WHERE IS_OBJECT(d.suspected_field) = false;

After that, you can fix those documents in the Data Service and when you create the filtered dataset again, the issue won’t be encountered again.
We have many type functions that you can use in a similar manner to validate your data.

If you don’t care about those documents that have the field as a string rather than an object and don’t want them to be ingested in Analytics, you can use the function to_object(field_name) on the field that might have the wrong string type when creating the dataset. For example, if the field name “suspected_field” is supposed to be an object but due to some data issue could be something else and you don’t care about the documents in which “suspected_field” is not an object, you can use a DDL like this:

CREATE DATASET filtered_dataset
ON some_bucket
WHERE to_object(suspected_field).nested_field = "some_value"

Now the link shouldn’t be disconnected when “susptected_field” is not an object but those documents won’t be ingested in Analytics.
Of course, the better approach is to fix your data in the Data Service by either fixing or deleting the invalid documents.

Hi @mhubail ,
Thanks, you’ve helped me a lot and I’m grateful for that. Didn’t expected to find the solution so quickly, many thanks to all who had gave advice.

Hi @Hussain.Towaileb.
Is there any documentation for function like this dcp_ingestion_failure_report(), or may be there is command to list them all?