DRL Rate-limiting failing edge case 1-2 calls in TYK v4.3.4

Hi All

We are testing TYK v4.3.4 and verifying some rate limiting scenarios in DRL scheme.
Setup Details:
Number of TYK nodes : 3 (distributed load in round-robin fashion)
VM Conf : 4 CPU, 8GB RAM

Following are the observations:

Number of APIs - 2

Case 1
Rate Limit - 30 tps in both APIs
Result - Both APIs gave success response upto 30 calls each and getting failed on 31st call.

Case 2
Rate Limit - 30 tps in 1st API, 20 tps in 2nd API
Result - 1st API gave all success responses but in 2nd API 18 calls were success and 2 calls were failing.

Case 3
Rate Limit - 30 tps in 1st API, 21 tps in 2nd API
Result - APIs are giving intermittently errors for 1 or 2 calls. Sometimes 1st API gives 1 error out of 30 and sometimes 2nd API gives 1-2 errors.

Understanding: We think that in case 2, 20 is not getting equally distributed in all 3 TYK nodes i.e. 20/3 = 6.66 which might be distributing 6 in each TYK node summing up it to 18 success calls as in result.

But in case 3, both rate-limit numbers (30, 21) can be evenly distributed in all 3 TYK nodes then also it is failing 1-2 calls.

In DRL (Leaky Bucket Algo), we expect 1-2 calls extra in success but here scenario is opposite i.e. 1-2 are failing.

Can anyone please guide here on how DRL is working.

Thanks!

Hi @Mohit_Kumar

Let me try this over the weekend and get back to you

@Mohit_Kumar Apologies it’s taken long to get back to you. Could you share your gateway config for review. I want to be sure I am accurately replicating the issue.

Hi @Olu please find below tyk conf:

{
“listen_port”: 8081,
“node_secret”: “af1cf2ec-6d88-4058-83f5-50321c5d67c1”,
“secret”: “af1cf2ec-6d88-4058-83f5-50321c5d67c1”,
“template_path”: “/opt/tyk-gateway/templates”,
“use_logstash”: false,
“use_db_app_configs”: false,
“db_app_conf_options”: {
“connection_string”: “”,
“node_is_segmented”: false,
“tags”:
},
“disable_dashboard_zeroconf”: true,
“app_path”: “/opt/tyk-gateway/apps”,
“middleware_path”: “/opt/tyk-gateway/middleware”,
“storage”: {
“type”: “redis”,
“enable_cluster”: true,
“host” : “localhost”,
“hosts”: {“redis-member-1”:“6379”,“redis-member-2”:“6379”,“redis-member-3”:“6379”,“redis-member-4”:“6379”,“redis-member-5”:“6379”,“redis-member-6”:“6379”},
“port”: 6379,
“username”: “”,
“password”: “”,
“database”: 0,
“optimisation_max_idle”: 2000,
“optimisation_max_active”: 4000
},
“enable_analytics”: true,
“analytics_config”: {
“type”: “mongo”,
“pool_size”: 100,
“csv_dir”: “/tmp”,
“mongo_url”: “”,
“mongo_db_name”: “”,
“mongo_collection”: “”,
“purge_delay”: 100,
“ignored_ips”: ,
“enable_detailed_recording”: false,
“enable_geo_ip”: false,
“geo_ip_db_path”: “”,
“storage_expiration_time”: 60,
“normalise_urls”: {
“enabled”: true,
“normalise_uuids”: true,
“normalise_numbers”: true,
“custom_patterns”:
}
},
“health_check”: {
“enable_health_checks”: false,
“health_check_value_timeouts”: 60
},
“optimisations_use_async_session_write”: true,
“allow_master_keys”: true,
“policies”: {
“policy_source”: “”,
“policy_connection_string”: “”,
“policy_record_name”: “tyk_policies”,
“allow_explicit_policy_id”: true
},
“hash_keys”: true,
“suppress_redis_signal_reload”: false,
“use_redis_log”: false,
“close_connections”: true,
“enable_non_transactional_rate_limiter”: true,
“enable_sentinel_rate_limiter”: false,
“experimental_process_org_off_thread”: false,
“enforce_org_quotas”: false,
“enforce_org_data_detail_logging”: false,
“local_session_cache”: {
“disable_cached_session_state”: false
},
“http_server_options”: {
“use_ssl”: true,
“enable_websockets”: true,
“certificates”: [
],
“ssl_insecure_skip_verify”: true
},
“uptime_tests”: {
“disable”: true,
“config”: {
“enable_uptime_analytics”: false,
“failure_trigger_sample_size”: 3,
“time_wait”: 1,
“checker_pool_size”: 50
}
},
“hostname”: “”,
“enable_custom_domains”: true,
“enable_jsvm”: false,
“oauth_redirect_uri_separator”: “;”,

“coprocess_options”: {
“enable_coprocess”: true,
“coprocess_grpc_server”: “unix:///tmp/grpc-go.sock”
},
“enable_bundle_downloader”: false,
“bundle_base_url”: “”,
“pid_file_location”: “./tyk-gateway.pid”,

“allow_insecure_configs”: true,
“public_key_path”: “”,
“close_idle_connections”: false,
“allow_remote_config”: false,

“global_session_lifetime”: 100,
“force_global_session_lifetime”: false,
“max_idle_connections_per_host”: 100,

“proxy_default_timeout”: 600
}

@Mohit_Kumar How are you testing the requests?

I was able to reproduce the issue with not only case 3 but with case 2 (after long tries) as well.

I asked internally and confirmed based on the docs that the DRL is not 100% accurate since it produces approximate result. It’s a trade off between speed and accuracy.

If you are looking for accuracy then Redis rate limiter maybe what you need but be wary of the performance considerations.

Thanks @Olu for testing, we did it using jmeter scripts.
We are looking for performance so ignoring these corner cases in DRL for now.