ThresholdLogReporter customization?

lenix · November 11, 2018, 3:56pm

Hello,

Sometimes in my server I get warning logs from ThresholdLogReporter, e.g.

[{“operation_name”:“n1ql”,“last_local_id”:“721DC6733B48777F/0000000013681C14”,“last_local_address”:“127.0.0.1:39504”,“last_remote_address”:“127.0.0.1:8093”,“last_dispatch_us”:1116600,“last_operation_id”:“856ad234-b07b-43d1-b22d-932d01c0ff93”,“total_us”:1117716},{“operation_name”:“n1ql”,“last_local_id”:“721DC6733B48777F/0000000026B1756E”,“last_local_address”:“127.0.0.1:39362”,“last_remote_address”:“127.0.0.1:8093”,“last_dispatch_us”:34705,“last_operation_id”:“986edc62-68ab-4277-9c03-d7be0e16c66b”,“total_us”:1450410}],“service”:“n1ql”,“count”:2}]

This lets me know the top10 slow operations but I’m doing many operations and I’m not 100% sure what’s the related query or KV operation. So I thought I should create my own tracer, supply it to the couchbase environment and log more useful data, because the data is there but it’s just not being logged, like the query or the key, they’re there in the couchbase request (or even the tags)

But it looks to me like the developers didn’t design with this in mind because I can’t extend ThresholdLogTracer or ThresholdLogReporter.

Can anyone advise how to do this?

edit: Also in the json above, according to the documentation I should receive “server_us” which shows me how long the server took, but it’s not there? Maybe it was and then got removed? Maybe I have to configure something? Not sure.

Thanks,

Couchbase 5.5.2 with sdk 2.7

MikeGoldsmith · November 12, 2018, 11:12am

Hi @lenix

The peer.service tag is the Couchbase service type, eg n1ql is Query, KV would be KeyValue. Also, from the screenshoot above, you can see the db.statement tag with a SELECT clause, which means it’s a N1QL query. KV operations should have the couchbase.document_key tag instead instead.

server_us is the time taken to execute the operation on the server and is returned with the response back to the client. Are you on Community or Enterprise edition?

lenix · November 12, 2018, 12:33pm

I already understand this but as I explained earlier, my problems are as follows:

server_us is not always included in the warning logs, most of the logs have this field missing.
in the warning logs, the n1ql statement or the key are not written, so I’m not sure which operation exactly I’m getting warned about. I understand there is last_operation_id but I’m not sure where can I co-relate this field? I tried the Query Monitor (from web ui) but the ids dont match.
Regarding point #2 above, I thought I can create my own tracer and supply it to the environment by the tracer() method but I can’t inheirt ThresholdLogReporter or ThresholdLogTracer… so I’m stuck with the existing tracer+reporter and customization on the existing one is pretty limited.

The above screenshot was taken from memory by setting a break-point. My point is, the data is there but it’s not logged and I can’t customize the tracer.

I’m running enterprise edition.

ingenthr · November 13, 2018, 1:58am

Our intent was that the built in Tracer implementation be simple and add value as is, but that people would plug in their own OpenTracing implementation like Jaeger, OpenZipkin, etc. to use the rest of the fields we’re including in the tracing span.

With the built in Tracer, we were trying to be cautious to not put a lot of information in the logs by default, just enough to help us identify the most common issues*. In other words, it’s not so much a query performance profiling tool (the query metrics and profiler do that as covered in the docs, albeit not through OpenTracing spans).

Your desire to extend the built in Tracer makes sense though-- we might be able to enhance it for that.

I hope that makes sense. You might want to check out some of the bigger Tracers.

* some of which we wrote up in the sdk-rfc, lot of details there though

lenix · November 13, 2018, 8:48am

@ingenthr Ok understood. So I can provide my own implementation through the environment tracer() method which is basically an implementation of io.opentracing.Tracer but currently I don’t have the time to do a full implementation thats why I was thinking of extending your implementation and logging the fields I want to see, however the objects in couchbase-jvm-core are very strict, e.g.

I can’t extend any of your classes (because of no default constructors)
I can’t create an instance of com.couchbase.client.core.tracing.ThresholdLogSpan because the constructor is protected. This means even if I do a full implementation and create my own span, my span will not have the CouchbaseRequest because setting the request to the span is done in AbstractCouchbaseRequest which strictly checks for instances of com.couchbase.client.core.tracing.ThresholdLogSpan … but I think I understand if your intent is not exposing the couchbase request to people that provides their implementation.

Anyway thanks for your response and your efforts

ingenthr · November 13, 2018, 7:17pm

Yes, we’d need to enhance it to make it easier. Of course, you can probably work around this a bit through copy/paste and putting it in the right namespace. I’ve opened JVMCBC-583 to look at opening up the interface.

Were it me though, I think the bigger benefit is there in trying Jaeger, since you can then trace outside Couchbase even and iterate further. In my opinion, it’s the correlation across services and even up into application code that really makes Tracing valuable. We built Response Time Observability to kind of address two things at once: get our basics done internally and let people plug into the wider world of OpenTracing.