Unexpected End of Stream Error on Long Response Times

Hello Tyk community,

I hope you’re all doing well. We are currently facing a critical issue with our Tyk Gateway setup running in a Docker and Kubernetes cluster environment. The problem arises specifically when the backend service takes more than 3 minutes to return the response. We are seeking some guidance and insights from the community to help us resolve this issue.

Problem Description: We have deployed Tyk Gateway as part of our API infrastructure in a Dockerized Kubernetes cluster. The Tyk Gateway is responsible for proxying requests to an upstream backend API. When the backend service takes more than 3 minutes to process the request and return the response, we encounter an “unexpected end of stream” error. This issue is consistent across different clients and tools, as we encountered the same problem when making requests from both a Java client and Postman.

Deployment Environment:

  • Kubernetes Cluster Environment
  • Dockerized Tyk Gateway Deployment

Observations:

  • The Tyk Gateway logs indicate that the request is being successfully received, and it forwards the request to the upstream backend API. The backend API also seems to receive the request.
  • However, when the response time from the backend API exceeds 3 minutes, the response from Tyk Gateway is not being read correctly by the client (Java code and Postman). Instead, it throws an “unexpected end of stream” exception.
  • Notably, the response time recorded in the Tyk Gateway logs is consistent with the backend response time, which exceeds 3 minutes for requests that encounter the error.

What We’ve Tried:

  • We have thoroughly inspected our Java client code and Postman configurations to rule out any issues on the client-side.
  • We increased the timeout settings in the Tyk Gateway (e.g., response_timeout) to values exceeding 3 minutes, but the issue persists for long response times.
  • We verified the backend API responses by making direct requests to the backend API through Postman, and the responses appear to be well-formed and as expected for both short and long response times

We seek your expertise and suggestions to help us understand if there are any specific Tyk Gateway configurations or Kubernetes settings that might be contributing to this problem. Any insights or recommendations to address this issue would be greatly appreciated.

Thank you in advance for your support.

Hello @atul_mittal and welcome to the community.

You might want to set downstream timeouts to be enough or more than sufficient for long responses.

  • http_server_options.read_timeout
  • http_server_options.write_timeout

If I am not wrong, I believe what is happening here is that the downstream is closing the connection just before a response is gotten back from the server. The default values are set at 120 secs, so responses longer that those would get cut off.

There isn’t an issue with the connection from Tyk upstream as that usually waits from a response from a server and doesn’t have a timeout (or waits forever) unless one is specified proxy_default_timeout.

So try setting the value of your http_server_options.(read|write)_timeout to a rough or approximate number of your longest response time and let us know if this resolves it.

Hello Olu,

Thank you for your prompt response. I truly appreciate your support.

As you suggested, I’ve made adjustments to the downstream timeouts as follows:
“http_server_options”: {
“enable_websockets”: true,
“read_timeout”: 0,
“write_timeout”: 0
}
I also attempted to increase the timeout values to 3600 milliseconds, as shown below:
read_timeout 3600,
write_timeout 3600

However, despite these changes, the upstream timeout issue still persists.

In an attempt to isolate the issue and confirm whether it consistently occurs within Tyk, I bypassed Nginx and directly hit Tyk. Unfortunately, the upstream timeout problem still occurs.

Hi @atul_mittal

Sorry for the issue you’re experiencing with this.

What is your Gateway version?
Please share the gateway logs.
Please share the gateway config file as well.

@atul_mittal -
Can you confirm that you set to 3600 milliseconds for downstream? If so, the connection will close in just under 4 seconds - but you said your upstream server can take 3 minutes to respond?

If that’s not the case, please can you share the logs as suggested by Ubong?

Thanks

What is your Gateway version? - tykio/tyk-gateway:v4.2
Please share the gateway logs.

time=“Aug 01 23:42:31” level=debug msg=Started api_id=myapp-custom-client-swagger-api api_name=skulibrary-custom-client-swagger-api mw=RateCheckMW org_id=sku origin=10.244.4.1 path=“/test” ts=1690933351598473265 │
│ time=“Aug 01 23:42:31” level=debug msg=Finished api_id=skulibrary-custom-client-swagger-api api_name=skulibrary-custom-client-swagger-api code=200 mw=RateCheckMW ns=41403 org_id=sku origin=10.244.4.1 path=“/test” │
│ time=“Aug 01 23:42:31” level=debug msg=“Started proxy” │
│ time=“Aug 01 23:42:31” level=debug msg=“Stripping proxy listen path: /custom-client-work/” │
│ time=“Aug 01 23:42:31” level=debug msg=“Upstream path is: /test}” │
│ time=“Aug 01 23:42:31” level=debug msg=Started api_id=skulibrary-custom-client-swagger-api api_name=skulibrary-custom-client-swagger-api mw=ReverseProxy org_id=sku ts=1690933351598770982 │
│ time=“Aug 01 23:42:31” level=debug msg=“Upstream request URL: {someUrl}}” api_id=skulibrary-custom-client-swagger-api api_name=skulibrary-custom-client-swagger-api mw=ReverseProxy org_id=sku │
│ time=“Aug 01 23:42:31” level=debug msg=“Outbound request URL: {someUrl}}” api_id=skulibrary-custom-client-swagger-api api_name=skulibrary-custom-cl │
│ [cors] 2023/08/01 23:42:31 Handler: Actual request │
│ [cors] 2023/08/01 23:42:31 Actual request no headers added: missing origin │
│ time="Aug 01 23:

time=“Aug 01 23:47:09” level=debug msg=Finished api_id=skulibrary-custom-client-swagger-api api_name=skulibrary-custom-client-swagger-api mw=ReverseProxy ns=278267966452 org_id=sku │
│ time=“Aug 01 23:47:09” level=debug msg=“Upstream request took (ms): 278268.006454”

│ time=“Aug 01 23:47:09” level=debug msg=“Adding Healthcheck to: skulibrary-custom-client-swagger-api.Request” │
│ time=“Aug 01 23:47:09” level=debug msg=“Val is: 278268” │
│ time=“Aug 01 23:47:09” level=debug msg=“Set value to: 1690933629866803138.278268” │
│ time=“Aug 01 23:47:09” level=debug msg=“Done proxy” │
│ time=“Aug 01 23:47:09” level=debug msg=“Incrementing raw key: skulibrary-custom-client-swagger-api.Request” │
│ time=“Aug 01 23:47:09” level=debug msg=“keyName is: skulibrary-custom-client-swagger-api.Request” │
│ time=“Aug 01 23:47:09” level=debug msg=“Now is:2023-08-01 23:47:09.869051065 +0000 UTC m=+1317.356071217”

Please share the gateway config file as well.

apiVersion: v1
data:
tyk.conf: |
{
“listen_port”: 8080,
“secret”: “”,
“template_path”: “/tyk-gateway/templates”,
“tyk_js_path”: “/tyk-gateway/js/tyk.js”,
“middleware_path”: “/tyk-gateway/middleware”,
“use_db_app_configs”: false,
“app_path”: “/tyk-gateway/apps/”,
“storage”: {
“type”: “redis”,
“host”: “myredishost”,
“port”: 6380,
“username”: “”,
“password”: “”,
“use_ssl”: true,
“database”: 0,
“optimisation_max_idle”: 2000,
“optimisation_max_active”: 4000,
“timeout”:3600
},
“enable_analytics”: false,
“analytics_config”: {
“type”: “csv”,
“csv_dir”: “/tmp”,
“mongo_url”: “”,
“mongo_db_name”: “”,
“mongo_collection”: “”,
“purge_delay”: -1,
“ignored_ips”: []
},
“health_check”: {
“enable_health_checks”: true,
“health_check_value_timeouts”: 60
},
“optimisations_use_async_session_write”: true,
“enable_non_transactional_rate_limiter”: true,
“enable_sentinel_rate_limiter”: false,
“enable_redis_rolling_limiter”: false,
“allow_master_keys”: false,
“policies”: {
“policy_source”: “file”,
“policy_record_name”: “/tyk-gateway/policies/policies.json”
},
“hash_keys”: true,
“close_connections”: false,
“http_server_options”: {
“enable_websockets”: true,
“read_timeout”: 3600,
“write_timout”: 3600
},
“allow_insecure_configs”: true,
“coprocess_options”: {
“enable_coprocess”: true,
“coprocess_grpc_server”: “”
},
“enable_bundle_downloader”: true,
“bundle_base_url”: “”,
“global_session_lifetime”: 100,
“force_global_session_lifetime”: false,
“max_idle_connections_per_host”: 500,
“enable_hashed_keys_listing”: true,
“enable_detailed_recording”: false,
“proxy_default_timeout”: 3600
}
kind: ConfigMap
metadata:
name: tyk-stg-gateway-conf
namespace: stg

[/quote]

Hi @atul_mittal,

Thank you for sharing these.

The message below should be in the logs, just before this one level=debug msg=“Outbound request URL: {someUrl}}” .... │if “proxy_default_timeout”: 3600 is taking effect on the gateway.

2023-08-03 11:14:39 time="Aug 03 10:14:39" level=debug msg="Setting timeout for outbound request to: 3600"

It should be there on the first proxy the gateway handles after it is started.

Are you able to restart the gateway and check? It would help us confirm that the settings are taking effect.
You can also use the environment variables to apply the timeout.

Also please share the response when you call /hello on the gateway