Tyk pump stopped working intermediately

Hi,

I have working setup of tyk gateway(5.0.0) & tyk pump(1.8).
It’s pushing analytics data to Kafka & Mongo db. Setup works as expected, but suddenly I see pump has stopped generating data. I restarted the pump containers & it started working. It happen couple of time.
I check the resources of pods, they are fine.

Below is the last log message:

time="Aug 31 14:07:58" level=warning msg="Pump Kafka Pump is taking more time than the value configured of purge_delay. You should try to set a timeout for this pump." prefix=main

Any idea or pointers how to debug this issue?

Hello @Sarvesh_Jain and welcome to the community :partying_face:

I remember seeing a similar issue like this. Depending on the severity, you may only loose the analytics record being written at that moment or you may loose further analytics because your pump instance would completely halt. A restart is usually the only way to recover the system, as you may have experienced.

From the logs it appears the Kafka pump is taking a while to complete write to your Kafka data sink. A potential cause of this could be that your Kafka back end was down at the time or experiencing some kind of network issue. The crux of the issue is that pump is finding it difficult to send analytics data to your target.

As the logs mentions, you would need to set a timeout value. I recommend setting a timeout value for both pump and the Kafka engine responsible for writing the data

Note
Please observe that the Kafka engine timeout is in duration and not int. Meaning the 4s, 1m, and 100ms are valid values (a fix to sync the type of the timeout values across the board is coming out in pump v1.8.2 out in pump v1.8.3).

Also ensure that purge_delay > pump_timeout > kafka_engine_timeout

Let us know if this helps