Tyk Gateway fetching API Definition Timeout from Tyk Dashboard

fattah.emir · July 21, 2023, 3:53am

Hi everyone currently i am stuck and don’t know where to go so i’ll leave it here
we encountered problem in regards to tyk gateway trying to get/fetch api definition from tyk pro
our setup is like this
tyk-pro:
docker based
tyk-gateway 5.0.3
tyk-dashboard 5.0.3
tyk-pump 1.8.1
postgresql 14
we also have a tyk ce which also running the same version
docker based
tyk-gateway 5.0.3
storage write to a file

chronology we recently tried upgrading our tyk version from version 4.3.1 to the later version 5.0.3, this happen due we wanted to migrate our tyk pro which was using mongodb but now we wanted it to be on postgresql, which in the 4.0.3 we encountered bugs so we upgrade to 5.0.3

fast forward yesterday we are trying to do a tyk-sync to update the definition on the dashboard but suddenly one of our gateway is restarted, we got these following messages on our docker logs for gateway
time=“Jul 20 15:25:47” level=debug msg=“Calling: http://xxx-tyk-dashboard-staging.xxxx.xxx:3000/system/apis”
time=“Jul 20 15:25:47” level=debug msg=“Using: NodeID: 5418e175-8fc4-xxxx-xxxx-xxxxxxxxx”
time=“Jul 20 15:26:17” level=error msg="failed to load API specs: failed to decode body: context deadline exceeded (Client.Timeout or context cancellation while reading body) body was: "
time=“Jul 20 15:26:17” level=error msg="Error during syncing apis:failed to decode body: context deadline exceeded (Client.Timeout or context cancellation while reading body) body was: " prefix=main
we tried restarting again but with no avail.

we noticed a consistent 30 second after the first Using: NodeID it will go either to failed to load API specs or it will succeed to continue, we think that this is related to some kind of timeout setting on the gateway itself. we also had tried tcpdump this and got RST packet from gateway to dashboard

connection reset from gateway 10 > dashboard 241

this problem only occur on our tyk pro and doesn’t affect our tyk ce

Olu · July 27, 2023, 5:06pm

Hello @fattah.emir and welcome to the community.

We have a similar commercial ticket with us opened for this. Is this related to that?

Regardless, here are some general suggestions that might help:

Increase timeout limits: Ensure that your Tyk Gateways and Dashboard have sufficient timeout settings in their configurations to handle large requests or potential issues with slow APIs. In v5.0.3 we have a new config to assist with this. Simply set the value for db_app_conf_options.connection_timeout or TYK_GW_DBAPPCONFOPTIONS_CONNECTIONTIMEOUT
Resource limitations: Verify your environemtn or pod has enough resources, such as memory and CPU power, to handle the load from multiple running Tyk instances. If you’s running on Kubernetes, check the resource limits set by the kubelet or through the resources section of a custom CronJob manifest file
Network latency: Validate your network connection between components (gateway, dashboard db etc) do not have a very high latency

fattah.emir · July 27, 2023, 10:40pm

Hi Olu,

yes this is related to that commercial case, it is also has been answered there.
this could be considered resolved