Hi there!
we are conducting an extensive load testing of the open source Tyk Gateway 5.1.
Shortly the setup is the following:
- three containerized Tyk instances in an ECS cluster
- an Elasticache Redis instance (node type cache.m5.2xlarge, running in 1 shard and 2 nodes).
- several sample upstream service containers (created based on Mockbin) which live in the same cluster. They serve to mock real upstream services and ensure minimal response time delay.
- The test runs with a k6 client generating +100 req/sec ending the test at 2000 req/sec.
We experienced that Tyk is slowing down after a while because of Elasticache Allowance Exceed. After a closer look of what happens, we found that there is a very high volume of data is being read from the Elasticache by Tyk. More precisely:
On the screenshot we can see the following:
- “Response timing -95th”: the response time increases significantly once the Elasticache Allowance Exceed starts to happen
- “Elasticache Network”: there are about 1500 connections open the Elasticache
- “Received (Tyk)”: 14.31 GiB of data received from Elasticache in every second
We monitored the communication between Tyk and the Redis database and found that in every second these commands are executed:
1693410667.617965 [0 172.18.0.2:47430] "set" "redis-test-b8917b54-5c8f-4746-ab7a-413accbe07f7" "test" "ex" "1"
1693410667.618609 [0 172.18.0.2:47430] "get" "redis-test-b8917b54-5c8f-4746-ab7a-413accbe07f7"
1693410667.619361 [0 172.18.0.2:37174] "set" "redis-test-242f7d15-0f92-4b90-bde0-5c2991c1e60d" "test" "ex" "1"
1693410667.619833 [0 172.18.0.2:37174] "get" "redis-test-242f7d15-0f92-4b90-bde0-5c2991c1e60d"
1693410667.620414 [0 172.18.0.2:37184] "set" "redis-test-6d1a27d4-b262-4a12-9be4-64c2312e584d" "test" "ex" "1"
1693410667.620867 [0 172.18.0.2:37184] "get" "redis-test-6d1a27d4-b262-4a12-9be4-64c2312e584d"
Our Tyk config looks like this:
{
"listen_port": 8080,
"secret": "<redacted>",
"template_path": "/opt/tyk-gateway/templates",
"tyk_js_path": "/opt/tyk-gateway/js/tyk.js",
"middleware_path": "/opt/tyk-gateway/middleware",
"use_db_app_configs": false,
"app_path": "/opt/tyk-gateway/apps/",
"storage": {
"type": "redis",
"enable_cluster": true,
"addrs": [ "clustercfg.<redacted>.cache.amazonaws.com:6379" ],
"port": 6379,
"username": "appservices-user-testing",
"password": "<redacted>%",
"use_ssl": true,
"database": 0,
"optimisation_max_idle": 2000,
"optimisation_max_active": 4000
},
"enable_analytics": false,
"analytics_config": {
"type": "redis",
"csv_dir": "/tmp",
"mongo_url": "",
"mongo_db_name": "",
"mongo_collection": "",
"purge_delay": -1,
"ignored_ips": []
},
"health_check": {
"enable_health_checks": false,
"health_check_value_timeouts": 60
},
"optimisations_use_async_session_write": false,
"enable_non_transactional_rate_limiter": true,
"enable_sentinel_rate_limiter": false,
"enable_redis_rolling_limiter": false,
"allow_master_keys": false,
"policies": {
"policy_source": "file",
"policy_record_name": "/opt/tyk-gateway/policies/policies.json"
},
"hash_keys": true,
"enable_hashed_keys_listing": true,
"close_connections": false,
"http_server_options": {
"enable_websockets": true
},
"allow_insecure_configs": true,
"coprocess_options": {
"enable_coprocess": false,
"coprocess_grpc_server": ""
},
"enable_bundle_downloader": true,
"bundle_base_url": "",
"global_session_lifetime": 100,
"force_global_session_lifetime": false,
"max_idle_connections_per_host": 500,
"enable_jsvm": true
}
Even though we disabled Tyk healthcheck trough config, Redis is still bombarded with healthcheck and the /hello
endpoint is available despite the API doc stating it would get disabled.
Can you please give us some hints what to check, how to optimize/minimize the communication between Elasticache and Tyk? With the current setup we reached only 1000 rps with an reasonable 50 ms response time, I am sure Tyk is capable to perform much better than this.
Thank you in advance,
Jozsef Kercso