FAQ: OpenTelemetry & Distributed tracing

sonja · September 6, 2022, 9:20am

Hi everybody,

I’m a senior product manager at Tyk working on observability. We have been getting many questions from our users about distributed tracing, the sunset of OpenTracing and our upcoming support for OpenTelemetry.

I have summarised all the answers to those questions below.
Is anything missing? Don’t hesitate to ask.

If you are new to observability and/or OpenTelemetry, you should probably read those two articles first: Observability Primer | OpenTelemetry and What is OpenTelemetry? | OpenTelemetry.

Q: Is OpenTelemetry support coming to Tyk?

Yes, distributed tracing support with OpenTelemetry is on the near-term roadmap for Tyk API Gateway. If this is a valuable feature for you, please leave a comment below saying as much.

Here are a couple of things we’d love to learn from you:

Why would you like to get distributed tracing from Tyk API Gateway? How will this make your life easier?
Which observability platform/tool are you using (Datadog, Dynatrace, New Relic, Elastic, HoneyComb, Splunk, Lightstep, Jaeger, Grafana Tempo, …)?
Do you have any specific requirements (e.g. format being used for trace-context propagation, granularity of the spans we will export, sampling, baggage, …)

Q: Will OpenTelemetry help me to monitor and troubleshoot GraphQL and UDG queries?

Yes! let us know you what you struggle with at the moment (e.g. federation) and we will look into your use cases.

Q: Now that OpenTracing is being sunsetted, can I still use OpenTracing with Tyk?

Yes! here’s what you should know:

Tyk API Gateway implements the OpenTracing specification using the OpenTracing Go library and exports the trace data using either the client libraries from Jaeger or from ZipKin. This can be configured in the Gateway (see Jaeger or Zipkin for details on the configuration option).

The CNCF (Cloud Native Foundation) has archived the OpenTracing project and Jaeger has also deprecated their client libraries. This means that no new pull requests or feature requests are accepted into OpenTracing or Jaeger repositories. This makes sense because the whole community has moved to OpenTelemetry with great progress!

OpenTelemetry support is on our near-term roadmap (see above). Until it is available you can definitively leverage OpenTracing to get Gateway timing and data in your traces. We are of course still supporting this functionality - let us know if you are having any issues with this.

Q: Which information does the current implementation with OpenTracing export?

Right now, you get timings for the time spent in the Tyk Gateway (version check, rate limit check, middleware, …) and for the time spent in upstream services.

There is room for improvement (error, http status code, …) and we plan to use the semantic convention from OpenTelemetry to guide us. Let us know if you are missing relevant insights.

Q: Can I use the OpenTelemetry collector to translate the spans exported with OpenTracing to the OpenTelemetry format?

The OpenTelemetry collector (the component responsible for collecting, processing and forwarding telemetry data) has the concept of receiver. A receiver accepts data in a specific format, translates it into the internal format and passes it to processors and exporters defined in the collector.

This means that you can use the traces exported by Tyk in the Jaeger or ZipKin format, translated into the OTLP format (or any other format required by your observability backend), by using the receiver for Jaeger or for ZipKin.

Attention: while it could be beneficial for your use case to be able to export the traces in the OTLP format and import them in the tool of your choice, you will still be missing one piece to achieve true end-to-end tracing: a unique format for context propagation.

Q: What is context propagation? Do you support W3C Trace Context?

When using distributed tracing, each service and component will export their own spans (part of the trace). A unique identifier (trace id) is needed to stitch those spans together to get the end-to-end distributed trace.

In the past, each observability vendor and tools implemented their own format to express this trace id (B3, Jaeger native propagation, …). A couple of years ago, a new standard was created: W3C trace-context and is the standard recommended by OpenTelemetry.

Here is a good video to learn more about context propagation with OpenTelemetry: Context Propagation makes OpenTelemetry awesome - YouTube.

With our OpenTracing support in Tyk API Gateway, we support B3 with Zipkin and Jaeger native propagation with Jaeger. We plan to use W3C Trace-Context in our upcoming support of OpenTelemetry.

Radu_Popa · February 26, 2023, 11:00am

Hi there Sonja,

Any news on OpenTelemetry support?

sonja · February 27, 2023, 11:26am

hi @Radu_Popa !

we have now a working demo that enables us to validate the instrumentation. I was playing with New Relic last week, take a look - it’s pretty cool!

The team is now busy with another topic and once this has been solved, they will be able to proceed with OpenTelemetry. Right now we are looking into a release in Q2 or Q3.

@Radu_Popa do you have any specific needs for the OpenTelemetry integration? which observability tools are you using?

Radu_Popa · February 28, 2023, 6:34pm

just setting up actually, I can go for either Jaeger or Zipkin… any advice?

sonja · March 1, 2023, 9:06am

I personally prefer Jaeger as open source observability back-end. But integration with Jaeger and zipkin provide the same amount of information.

Just note that at the moment, the traces (using OpenTracing) we send to Jaeger are somehow limited (e.g. we do not record any information about the errors, only timing about the different middleware). And this was we will improve with OpenTelemetry in the future.

Radu_Popa · March 1, 2023, 11:16am

So, as of now best option is Jaeger, and support will improve later with open telemetry?
Also, how can I get actual error information?

sonja · March 1, 2023, 11:50am

exactly!

right now you cannot get error information from the OpenTracing integration but this will be available with OpenTelemetry integration.

In the mid-time, you can still look at errors directly from Tyk Managers, or if you want to export error information you can also look at Tyk Pump to export logs or metrics for the error to different observability back-ends.

It all comes down to which observability tool you are using at the moment and where you would like to see your data (in Tyk or in one of your own observability tool).

Alexander_Chesser · April 18, 2023, 1:16pm

Are there any preview bits available to test out the opentelemetry solution? Any update on the release timeline of the Otel project?

Gregor_Jelenc · May 24, 2023, 1:22pm

Any progress on distributed tracing ?

sonja · May 24, 2023, 1:28pm

hi @Gregor_Jelenc and welcome to the community

we are an internal preview version that we use for demo and validating the data we are exporting. Right now, it is scheduled to be released in Q3.

Do you have any specific requirements? what’s your use case and your observability tool of choice?

sonja · May 24, 2023, 1:29pm

hi @Alexander_Chesser sorry I missed your comment here, I believe we had a call together in the mid-time, so I hope your questions were clarified.

Right now the expected timeline for OpenTelemetry support is Q3. wish you a great day!

skawaguchi_flipp · June 14, 2023, 5:51pm

We’re using Datadog and would like APMs so that we can trace calls through our microservices.

Any updates on ETA of OpenTelemetry? Is it still Q3?

sonja · June 15, 2023, 6:16am

yes, looking good for Q3! I can’t wait for us to release it
any specific data you are interested in?

skawaguchi_flipp · June 15, 2023, 5:38pm

Yes, a bit more info would be great.

For context, we’re trying to integrate our Tyk Open Source pilot with Datadog APMs. We’re following these instructions. We have a few questions:

We’d like to be able to track the Golden Signals on the Tyk stuff. Any advice on which metrics we might consider?
Can we track stuff like requests and errors on a per-key basis? The pump docs don’t list api_key, but this example does.

Any advice would be much-appreciated!

tykUser · June 16, 2023, 12:59am

Interested in using OpenTelemetry with DataDog and UDG requests. Specifically we want better monitoring of request times and response codes of the downstream services and how they are impacting UDG responses. Improved insight into Redis requests per individual API and UDG would be great as well. Optional monitoring of transformations, and other plugin execution times as part of the trace stack would also be great to have.

sonja · June 16, 2023, 6:18am

Sounds like OpenTelemetry will really help make your life easier. We will post examples and documentation, including discussion most important metrics.

In the mid-time, when using Pump, the data we can export is documented here: GitHub - TykTechnologies/tyk-pump: Tyk Analytics Pump to move analytics data from Redis to any supported back end (multiple back ends can be written to at once).. However I don’t see the api_key listed as part of the export options in the documentation of the DogStatsD. Each Pump integration might have different options.

One other resource for you: a blog post about SLO/SLI and RED metrics using Prometheus Pump: Service objectives for your APIs with Tyk, Prometheus & Grafana

I hope this helps!

sonja · June 16, 2023, 6:20am

yes to all of this! we might not be able to get all of this into the first release, but this is definitively the goal

mohammadrafaat · June 23, 2023, 7:52am

@sonja Hi, if I need to deploy a tracing solution, what are the available options for me now till the open telemetry release in Q3

Olu · June 23, 2023, 2:04pm

Jaeger, Zipkin and Newrelic as outlined here Distributed Tracing

New Relic can also be set on the gateway via Instrumentation

sonja · August 4, 2023, 7:17am

Here is a quick preview of what will be possible with our upcoming support for OTel: (Preview) What are the key components of an effective API Observability dashboard? - feedback always appreciated