Some connections are failing or dropping by Tyk while proxying

We enabled Tyk in our production last week and since then we are constantly seeing dropped connection with errors. Some of our client’s have also complained about our API performance right after this change.

Some of the common errors are:

  1. Context cancelled
  2. http: proxy error: EOF
  3. http: proxy error during body copy: unexpected EOF
 
Mar 24, 2024 07:01:11.263
time="Mar 24 07:01:11" level=error msg="http: proxy error: EOF" api_id=YWR0L3ZveWFnZXMtYXBpLW1hc3Rlci1hcGktdm9ydGV4YS1jb20 api_name=voyages-api-master mw=ReverseProxy org_id= prefix=proxy server_name="adt-voyages-api-master.adt.svc.cluster.local:3002" 
gateway-tyk-gateway-tyk-headless-5b665688d4-hczm8
 
Mar 24, 2024 07:01:11.263
time="Mar 24 07:01:11" level=error msg="http: proxy error: read tcp 10.3.98.100:45796->172.20.225.229:3002: read: connection reset by peer" api_id=YWR0L3ZveWFnZXMtYXBpLW1hc3Rlci1hcGktdm9ydGV4YS1jb20 api_name=voyages-api-master mw=ReverseProxy org_id= prefix=proxy server_name="adt-voyages-api-master.adt.svc.cluster.local:3002" 
gateway-tyk-gateway-tyk-headless-5b665688d4-sxdsv
 
Mar 24, 2024 07:01:11.263
time="Mar 24 07:01:11" level=error msg="http: proxy error: EOF" api_id=YWR0L3ZveWFnZXMtYXBpLW1hc3Rlci1hcGktdm9ydGV4YS1jb20 api_name=voyages-api-master mw=ReverseProxy org_id= prefix=proxy server_name="adt-voyages-api-master.adt.svc.cluster.local:3002" 
gateway-tyk-gateway-tyk-headless-5b665688d4-hczm8
 
Mar 24, 2024 07:01:28.713
time="Mar 24 07:01:28" level=error msg="http: proxy error during body copy: unexpected EOF" api_id=YWR0L3ZveWFnZXMtYXBpLW1hc3Rlci1hcGktdm9ydGV4YS1jb20 api_name=voyages-api-master mw=ReverseProxy org_id= prefix=proxy
gateway-tyk-gateway-tyk-headless-5b665688d4-hczm8
 
Mar 24, 2024 07:01:28.713
time="Mar 24 07:01:28" level=error msg="http: proxy error: EOF" api_id=YWR0L3ZveWFnZXMtYXBpLW1hc3Rlci1hcGktdm9ydGV4YS1jb20 api_name=voyages-api-master mw=ReverseProxy org_id= prefix=proxy server_name="adt-voyages-api-master.adt.svc.cluster.local:3002" 
gateway-tyk-gateway-tyk-headless-5b665688d4-hczm8
 
Mar 24, 2024 07:01:28.714
time="Mar 24 07:01:28" level=error msg="http: proxy error: EOF" api_id=YWR0L3ZveWFnZXMtYXBpLW1hc3Rlci1hcGktdm9ydGV4YS1jb20 api_name=voyages-api-master mw=ReverseProxy org_id= prefix=proxy server_name="adt-voyages-api-master.adt.svc.cluster.local:3002" 
gateway-tyk-gateway-tyk-headless-5b665688d4-69n8z
 
Mar 24, 2024 09:48:37.035
time="Mar 24 09:48:37" level=error msg="http: proxy error: context canceled" api_id=YWR0L3ZveWFnZXMtYXBpLW1hc3Rlci1hcGktdm9ydGV4YS1jb20 api_name=voyages-api-master mw=ReverseProxy org_id= prefix=proxy server_name="adt-voyages-api-master.adt.svc.cluster.local:3002" 
gateway-tyk-gateway-tyk-headless-5b665688d4-sxdsv
 
Mar 24, 2024 09:52:02.534
time="Mar 24 09:52:02" level=error msg="http: proxy error: context canceled" api_id=YWR0L3ZveWFnZXMtYXBpLW1hc3Rlci1hcGktdm9ydGV4YS1jb20 api_name=voyages-api-master mw=ReverseProxy org_id= prefix=proxy server_name="adt-voyages-api-master.adt.svc.cluster.local:3002" 

i am unable to figure out the root cause of these errors. Would appreciate some help.

Only thing I am able to think of to resolve this issue is by doing a request retry incase of http unexpected error from upstream by middleware, but I am not quite sure hot to do that

The http: proxy error: EOF error happens when the API endpoint upstream from Tyk closes the connection abruptly before the API call is complete. It can be a resource limit in the upstream service but most often it’s caused by the upstream resource crashing.

Tyk is unable to continue with the request so it returns the error There was a problem proxying the request to the caller. The first place to investigate is the logs of that upstream service.

Thanks for your reply @Olu . We increased resources for upstream service and can see the errors have reduced significantly. But errors are still there since the upstream service is running as pods and whenever a pod’s resource utilization spikes, we are seeing these errors and failure in API requests.

This did not use to happen previously because Nginx does a retry on its own in such case.

Are there any parameters or configurations in Tyk that I am missing that could help here?

I just noticed you were also observing proxy downstream (http: proxy error: context canceled) and upstream ( http: proxy error: EOF) errors.

http: proxy error: context canceled - > indicates that the upstream connections are being closed unexpectedly. See more details in this document.

What version are you on? There were some performance regressions in v5.0.9, v5.0.10 and v5.2.5. v5.2.6 had some performance improvements added to it that could help.

Our planning for production doc can help you get the best out of your gateway. The general rule would be to ensure downstream timeouts are the same as upstream timeouts.

Hope this helps

I am testing v5.2.6.

I don’t understand when you say “make sure downstream timeouts are same as upstream timeouts”

Do you mean to set same values for " TYK_GW_PROXYDEFAULTTIMEOUT" and " TYK_GW_HTTPSERVEROPTIONS_READTIMEOUT" in gateway?

Yes, the write timeout, read timeout and proxy timeout should be the same. I say it’s a general rule but it’s really just for simplicity sake. So there isn’t confusion as to what is causing the timeout if all have the same values.