Gateway connection to Redis in Kubernetes

Hi guys,

I am getting crazy with this problem. Trying to set up tyk gateway in Kubernetes. I already set up redis cluster that runs in 6 instances (3 master and 3 slave), and everything seems OK.

On the other side, I am using tyk gateway image tykio/tyk-gateway and deploying it based on tyk-oss-k8s-deployment github yamls. I set up storage as type redis and cluster enabled. Problem is that I am seeing this messages continously:

[Sep 21 14:34:20] DEBUG Creating new Redis connection pool
[Sep 21 14:34:20]  INFO --> [REDIS] Creating cluster client
[Sep 21 14:34:29] ERROR main: Redis health check failed error=storage: Redis is either down or was not configured liveness-check=true
[Sep 21 14:34:29] DEBUG host-check-mgr: No Primary instance found, assuming control
[Sep 21 14:34:29] ERROR cannot set key in pollerCacheKey error=storage: Redis is either down or was not configured
[Sep 21 14:34:29] DEBUG HOST CHECKER: Host list reset
[Sep 21 14:34:29] ERROR pub-sub: Connection to Redis failed, reconnect in 10s error=storage: Redis is either down or was not configured
[Sep 21 14:34:39] ERROR main: Redis health check failed error=storage: Redis is either down or was not configured liveness-check=true
[Sep 21 14:34:39] DEBUG host-check-mgr: No Primary instance found, assuming control
[Sep 21 14:34:39] ERROR cannot set key in pollerCacheKey error=storage: Redis is either down or was not configured
[Sep 21 14:34:39] ERROR pub-sub: Connection to Redis failed, reconnect in 10s error=storage: Redis is either down or was not configured
[Sep 21 14:34:49] ERROR main: Redis health check failed error=storage: Redis is either down or was not configured liveness-check=true
[Sep 21 14:34:49] DEBUG host-check-mgr: No Primary instance found, assuming control
[Sep 21 14:34:49] ERROR cannot set key in pollerCacheKey error=storage: Redis is either down or was not configured
[Sep 21 14:34:49] ERROR pub-sub: Connection to Redis failed, reconnect in 10s error=storage: Redis is either down or was not configured
[Sep 21 14:34:59] DEBUG host-check-mgr: No Primary instance found, assuming control

My tyk.conf (maybe a little messed up yet), looks like this

    {
      "listen_address": "",
      "listen_port": 8081,
      "secret": "352d20ee67be67f6340b4c0605b044b7",
      "template_path": "/opt/tyk/templates",
      "tyk_js_path": "/opt/tyk/js/tyk.js",
      "middleware_path": "/opt/tyk/middleware",
      "use_db_app_configs": false,
      "db_app_conf_options": {
          "connection_string": "http://tyk-dashboard.tyk.svc.cluster.local:3000",
          "node_is_segmented": false,
          "tags": ["test2"]
      },
      "app_path": "/opt/tyk/apps/",
      "storage": {
        "type": "redis",
        "enable_cluster": true,
        "addrs": [
            "redis-cluster-0.redis-cluster.redis.svc.cluster.local:6379",
            "redis-cluster-3.redis-cluster.redis.svc.cluster.local:6379",
            "redis-cluster-1.redis-cluster.redis.svc.cluster.local:6379",
            "redis-cluster-4.redis-cluster.redis.svc.cluster.local:6379",
            "redis-cluster-2.redis-cluster.redis.svc.cluster.local:6379",       
            "redis-cluster-5.redis-cluster.redis.svc.cluster.local:6379"]
        ,
        "database": 0,
        "optimisation_max_idle": 2000,
        "optimisation_max_active": 4000,
        "username": "",
        "password": "",
        "use_ssl": false
      },
      "enable_analytics": false,
      "analytics_config": {
          "type": "mongo",
          "csv_dir": "/tmp",
          "mongo_url": "",
          "mongo_db_name": "",
          "mongo_collection": "",
          "purge_delay": -1,
          "ignored_ips": []
      },
      "optimisations_use_async_session_write": true,
      "enable_non_transactional_rate_limiter": true,
      "enable_sentinel_rate_limiter": false,
      "enable_redis_rolling_limiter": false,
      "allow_master_keys": false,
      "hash_keys": true,
      "close_connections": true,
      "http_server_options": {
        "enable_websockets": true
      },
      "allow_insecure_configs": true,
      "coprocess_options": {
        "enable_coprocess": false,
        "coprocess_grpc_server": ""
      },
      "enable_bundle_downloader": true,
      "bundle_base_url": "",
      "global_session_lifetime": 100,
      "force_global_session_lifetime": false,
      "max_idle_connections_per_host": 100,
      "use_ssl": false,
      "redis_addrs": [
        "redis-cluster-0.redis-cluster.redis.svc.cluster.local:6379",
        "redis-cluster-3.redis-cluster.redis.svc.cluster.local:6379",
        "redis-cluster-1.redis-cluster.redis.svc.cluster.local:6379",
        "redis-cluster-4.redis-cluster.redis.svc.cluster.local:6379",
        "redis-cluster-2.redis-cluster.redis.svc.cluster.local:6379",       
        "redis-cluster-5.redis-cluster.redis.svc.cluster.local:6379"
      ]     
    }

I think I tryed everything right now.

Tryed with different versions with same results, but actually I’m on tyk gateway 4.2 and redis 6.0.16.
Tryed “use_ssl”: false and true

Visibility seems ok between pods. From inside tyk pod:

[email protected]:~# nc -vz redis-cluster-0.redis-cluster.redis.svc.cluster.local 6379
Connection to redis-cluster-0.redis-cluster.redis.svc.cluster.local (10.244.2.194) 6379 port [tcp/redis] succeeded!
[email protected]:~# nc -vz redis-cluster-1.redis-cluster.redis.svc.cluster.local 6379
Connection to redis-cluster-1.redis-cluster.redis.svc.cluster.local (10.244.4.207) 6379 port [tcp/redis] succeeded!
[email protected]:~# nc -vz redis-cluster-2.redis-cluster.redis.svc.cluster.local 6379
Connection to redis-cluster-2.redis-cluster.redis.svc.cluster.local (10.244.3.139) 6379 port [tcp/redis] succeeded!
[email protected]:~# nc -vz redis-cluster-3.redis-cluster.redis.svc.cluster.local 6379
Connection to redis-cluster-3.redis-cluster.redis.svc.cluster.local (10.244.4.209) 6379 port [tcp/redis] succeeded!
[email protected]:~# nc -vz redis-cluster-4.redis-cluster.redis.svc.cluster.local 6379
Connection to redis-cluster-4.redis-cluster.redis.svc.cluster.local (10.244.3.140) 6379 port [tcp/redis] succeeded!
[email protected]:~# nc -vz redis-cluster-5.redis-cluster.redis.svc.cluster.local 6379
Connection to redis-cluster-5.redis-cluster.redis.svc.cluster.local (10.244.2.195) 6379 port [tcp/redis] succeeded!

Also installed redis-cli on gateway pod, and played with some commands just to check I could access without problems.

Please guys, could you give some clues?
Thank you

EDIT - some more info:

[email protected]:~# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:33763         0.0.0.0:*               LISTEN      1/tyk
tcp6       0      0 :::8081                 :::*                    LISTEN      1/tyk
[email protected]:~# netstat -plnt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:6379            0.0.0.0:*               LISTEN      13/redis-server *:6
tcp        0      0 0.0.0.0:16379           0.0.0.0:*               LISTEN      13/redis-server *:6
tcp6       0      0 :::6379                 :::*                    LISTEN      13/redis-server *:6
tcp6       0      0 :::16379                :::*                    LISTEN      13/redis-server *:6

Also comment that redis and tyk pods are in different namespaces, but that shouldn’t be a problem right?
EDIT2: Tryed with redis and tyk gateway on same namespace. The problem persists.

I attach complete log:

time="Sep 23 06:41:07" level=debug msg="Using /opt/tyk-gateway/tyk.conf for configuration" prefix=main
time="Sep 23 06:41:07" level=info msg="Tyk API Gateway 4.2.0" prefix=main
time="Sep 23 06:41:07" level=warning msg="Insecure configuration allowed" config.allow_insecure_configs=true prefix=checkup
time="Sep 23 06:41:07" level=warning msg="Default secret should be changed for production." config.secret=352d20ee67be67f6340b4c0605b044b7 prefix=checkup
time="Sep 23 06:41:07" level=error msg="Could not set version in versionStore" error="storage: Redis is either down or was not configured" prefix=main
time="Sep 23 06:41:07" level=debug msg="No Primary instance found, assuming control" prefix=host-check-mgr
time="Sep 23 06:41:07" level=error msg="cannot set key in pollerCacheKey" error="storage: Redis is either down or was not configured"
time="Sep 23 06:41:07" level=info msg="Starting Poller" prefix=host-check-mgr
time="Sep 23 06:41:07" level=debug msg="---> Initialising checker" prefix=host-check-mgr
time="Sep 23 06:41:07" level=debug msg="[HOST CHECKER] Config:TriggerLimit: 3"
time="Sep 23 06:41:07" level=debug msg="[HOST CHECKER] Config:Timeout: ~10"
time="Sep 23 06:41:07" level=debug msg="[HOST CHECKER] Config:WorkerPool: 2"
time="Sep 23 06:41:07" level=info msg="Rich plugins are disabled" prefix=coprocess
time="Sep 23 06:41:07" level=debug msg="Notifier will not work in hybrid mode" prefix=main
time="Sep 23 06:41:07" level=debug msg="[HOST CHECKER] Init complete"
time="Sep 23 06:41:07" level=debug msg="---> Starting checker" prefix=host-check-mgr
time="Sep 23 06:41:07" level=debug msg="[HOST CHECKER] Starting..."
time="Sep 23 06:41:07" level=debug msg="[HOST CHECKER] Check loop started..."
time="Sep 23 06:41:07" level=debug msg="[HOST CHECKER] Host reporter started..."
time="Sep 23 06:41:07" level=debug msg="---> Checker started." prefix=host-check-mgr
time="Sep 23 06:41:07" level=info msg="PIDFile location set to: /var/run/tyk/tyk-gateway.pid" prefix=main
time="Sep 23 06:41:07" level=warning msg="The control_api_port should be changed for production" prefix=main
time="Sep 23 06:41:07" level=debug msg="Initialising default org store" prefix=main
time="Sep 23 06:41:07" level=info msg="Initialising Tyk REST API Endpoints" prefix=main
time="Sep 23 06:41:07" level=error msg="Connection to Redis failed, reconnect in 10s" error="storage: Redis is either down or was not configured" prefix=pub-sub
time="Sep 23 06:41:07" level=debug msg="Creating new Redis connection pool"
time="Sep 23 06:41:07" level=info msg="--> [REDIS] Creating cluster client"
time="Sep 23 06:41:07" level=debug msg="Loaded API Endpoints" prefix=main
time="Sep 23 06:41:07" level=info msg="--> Standard listener (http)" port=":8081" prefix=main
time="Sep 23 06:41:07" level=warning msg="Starting HTTP server on:[::]:8081" prefix=main
time="Sep 23 06:41:07" level=info msg="Initialising distributed rate limiter" prefix=main
time="Sep 23 06:41:07" level=debug msg="DRL: Setting node ID: solo-dad9e877-b232-44fe-961e-1e3b5d451c2b|tyk-gateway-65df8fdb9c-6znfx"
time="Sep 23 06:41:07" level=info msg="Tyk Gateway started (4.2.0)" prefix=main
time="Sep 23 06:41:07" level=info msg="--> Listening on address: (open interface)" prefix=main
time="Sep 23 06:41:07" level=info msg="--> Listening on port: 8081" prefix=main
time="Sep 23 06:41:07" level=info msg="--> PID: 1" prefix=main
time="Sep 23 06:41:07" level=info msg="Loading policies" prefix=main
time="Sep 23 06:41:07" level=debug msg="No policy record name defined, skipping..." prefix=main
time="Sep 23 06:41:07" level=info msg="Loading API Specification from /opt/tyk/apps/app_sample.json"
time="Sep 23 06:41:07" level=info msg="Starting gateway rate limiter notifications..."
time="Sep 23 06:41:07" level=debug msg="Checking for transform paths..."
time="Sep 23 06:41:07" level=debug msg="Checking for transform paths..."
time="Sep 23 06:41:07" level=info msg="Detected 1 APIs" prefix=main
time="Sep 23 06:41:07" level=info msg="Loading API configurations." prefix=main
time="Sep 23 06:41:07" level=info msg="Tracking hostname" api_name="Tyk Test API" domain="(no host)" prefix=main
time="Sep 23 06:41:07" level=info msg="Initialising Tyk REST API Endpoints" prefix=main
time="Sep 23 06:41:07" level=debug msg="Loaded API Endpoints" prefix=main
time="Sep 23 06:41:07" level=info msg="API bind on custom port:0" prefix=main
time="Sep 23 06:41:07" level=debug msg="Initializing API" api_id=1 api_name="Tyk Test API" org_id=default
time="Sep 23 06:41:07" level=debug msg="Batch requests enabled for API" prefix=main
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=VersionCheck org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=RateCheckMW org_id=default
time="Sep 23 06:41:07" level=info msg="Checking security policy: Token" api_id=1 api_name="Tyk Test API" org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=AuthKey org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=KeyExpired org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=AccessRightsCheck org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=GranularAccessMiddleware org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=RateLimitAndQuotaCheck org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=VersionCheck org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=KeyExpired org_id=default
time="Sep 23 06:41:07" level=debug msg=Init api_id=1 api_name="Tyk Test API" mw=AccessRightsCheck org_id=default
time="Sep 23 06:41:07" level=debug msg="Rate limit endpoint is: /tyk-api-test/tyk/rate-limits" api_id=1 api_name="Tyk Test API" org_id=default
time="Sep 23 06:41:07" level=debug msg="Setting Listen Path: /tyk-api-test/" api_id=1 api_name="Tyk Test API" org_id=default
time="Sep 23 06:41:07" level=info msg="API Loaded" api_id=1 api_name="Tyk Test API" org_id=default prefix=gateway server_name=-- user_id=-- user_ip=--
time="Sep 23 06:41:07" level=debug msg="Checker host list" prefix=main
time="Sep 23 06:41:07" level=info msg="Loading uptime tests..." prefix=host-check-mgr
time="Sep 23 06:41:07" level=debug msg="--- Setting tracking list up" prefix=host-check-mgr
time="Sep 23 06:41:07" level=debug msg="Reset initiated" prefix=host-check-mgr
time="Sep 23 06:41:07" level=debug msg="[HOST CHECKER] Checker reset queued!"
time="Sep 23 06:41:07" level=debug msg="Checker host Done" prefix=main
time="Sep 23 06:41:07" level=info msg="Initialised API Definitions" prefix=main
time="Sep 23 06:41:07" level=info msg="API reload complete" prefix=main
time="Sep 23 06:41:08" level=debug msg="Creating new Redis connection pool"
time="Sep 23 06:41:08" level=info msg="--> [REDIS] Creating cluster client"
time="Sep 23 06:41:08" level=debug msg="Creating new Redis connection pool"
time="Sep 23 06:41:08" level=info msg="--> [REDIS] Creating cluster client"
time="Sep 23 06:41:17" level=error msg="Redis health check failed" error="storage: Redis is either down or was not configured" liveness-check=true prefix=main
time="Sep 23 06:41:17" level=debug msg="No Primary instance found, assuming control" prefix=host-check-mgr
time="Sep 23 06:41:17" level=error msg="cannot set key in pollerCacheKey" error="storage: Redis is either down or was not configured"
time="Sep 23 06:41:17" level=error msg="Connection to Redis failed, reconnect in 10s" error="storage: Redis is either down or was not configured" prefix=pub-sub
time="Sep 23 06:41:18" level=debug msg="[HOST CHECKER] Host list reset"
time="Sep 23 06:41:27" level=debug msg="No Primary instance found, assuming control" prefix=host-check-mgr
time="Sep 23 06:41:27" level=error msg="cannot set key in pollerCacheKey" error="storage: Redis is either down or was not configured"
time="Sep 23 06:41:27" level=error msg="Redis health check failed" error="storage: Redis is either down or was not configured" liveness-check=true prefix=main
time="Sep 23 06:41:27" level=error msg="Connection to Redis failed, reconnect in 10s" error="storage: Redis is either down or was not configured" prefix=pub-sub
time="Sep 23 06:41:37" level=debug msg="No Primary instance found, assuming control" prefix=host-check-mgr
time="Sep 23 06:41:37" level=error msg="cannot set key in pollerCacheKey" error="storage: Redis is either down or was not configured"
time="Sep 23 06:41:37" level=error msg="Redis health check failed" error="storage: Redis is either down or was not configured" liveness-check=true prefix=main
time="Sep 23 06:41:37" level=error msg="Connection to Redis failed, reconnect in 10s" error="storage: Redis is either down or was not configured" prefix=pub-sub
time="Sep 23 06:41:47" level=error msg="Redis health check failed" error="storage: Redis is either down or was not configured" liveness-check=true prefix=main
time="Sep 23 06:41:47" level=debug msg="No Primary instance found, assuming control" prefix=host-check-mgr
time="Sep 23 06:41:47" level=error msg="cannot set key in pollerCacheKey" error="storage: Redis is either down or was not configured"
time="Sep 23 06:41:47" level=error msg="Connection to Redis failed, reconnect in 10s" error="storage: Redis is either down or was not configured" prefix=pub-sub
time="Sep 23 06:41:57" level=debug msg="No Primary instance found, assuming control" prefix=host-check-mgr
time="Sep 23 06:41:57" level=error msg="cannot set key in pollerCacheKey" error="storage: Redis is either down or was not configured"
time="Sep 23 06:41:57" level=error msg="Redis health check failed" error="storage: Redis is either down or was not configured" liveness-check=true prefix=main
time="Sep 23 06:41:57" level=error msg="Connection to Redis failed, reconnect in 10s" error="storage: Redis is either down or was not configured" prefix=pub-sub
time="Sep 23 06:42:07" level=debug msg="No Primary instance found, assuming control" prefix=host-check-mgr
time="Sep 23 06:42:07" level=error msg="cannot set key in pollerCacheKey" error="storage: Redis is either down or was not configured"
time="Sep 23 06:42:07" level=error msg="Redis health check failed" error="storage: Redis is either down or was not configured" liveness-check=true prefix=main
time="Sep 23 06:42:07" level=error msg="Connection to Redis failed, reconnect in 10s" error="storage: Redis is either down or was not configured" prefix=pub-sub

Hi @Gabriel_Pu,

Welcome to the community :tada:

We’re taking a look at this and will revert soon.

Great effort btw :sunglasses:

1 Like

Hi @Gabriel_Pu,

A quick review of your tyk.conf

  • “redis_addrs” is not a gateway config. It’s a dashboard config.
  • “use_ssl” just above that is not right as well. The one within “storage” is the relevant one.

But those are not responsible for the error.

Yep, you are right!
As said I tryed a thousand different things, so I know the config file is a bit messed up.

Also tryed with “whatever:6379” inside “addrs”, and the result its the same, so seem gateway is completely ignoring it. However I am pretty sure tyk is taking this config file, as other changes are reflected.

Another try was injecting enviroment variables from deployment.yml (with same result):

            - name: TYK_GW_STORAGE_ADDRS
              value: "redis-cluster-0.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-1.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-2.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-3.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-4.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-5.redis-cluster.redis.svc.cluster.local:6379"
            - name: TYK_GW_STORAGE_HOSTS
              value: "redis-cluster-0.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-1.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-2.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-3.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-4.redis-cluster.redis.svc.cluster.local:6379,redis-cluster-5.redis-cluster.redis.svc.cluster.local:6379"

How have you set up your Redis cluster? Can you share?

Here is my Statefulset:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
  namespace: redis
spec:
  serviceName: redis-cluster
  replicas: 6
  selector:
    matchLabels:
      app: redis-cluster
  template:
    metadata:
      labels:
        app: redis-cluster
    spec:
      schedulerName: redisScheduler               
      initContainers:
      - name: config
        image: redis:6.0.16
        command: [ "sh", "-c" ]
        args: ["cp -a \"/data/files/.\" \"/etc/redis/\""]     
        volumeMounts:
        - name: redis-config
          mountPath: /etc/redis/
        - name: config
          mountPath: /data/files/   
      containers:
      - name: redis
        image: redis:latest
        ports:
        - containerPort: 6379
          name: client
        - containerPort: 16379
          name: gossip
        command: [ "sh", "-c" ]
        args: ["/etc/redis/update-node.sh;redis-server /etc/redis/redis.conf"]
        volumeMounts:
        - name: pv-data-redis
          mountPath: /data
          readOnly: false
        - name: redis-config
          mountPath: /etc/redis/          
      volumes:
      - name: redis-config
        emptyDir: {}
      - name: config
        configMap:
          name: redis-config
          defaultMode: 0744
  volumeClaimTemplates:
  - metadata:
      name: pv-data-redis
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: "local-storage"
      resources:
        requests:
          storage: 500Mi

update-node.sh does a “meet” to the other instances up. I think its only required if an instance falls and its IP changes.
redisScheduler its a custom scheduler that just set each intance to a concrete worker-node of my convenience
At bootsrap I run a job that make sure the cluster its created and each instance has its role. Something like this:

redis-cli -h `dig +short redis-cluster-0.redis-cluster.redis.svc.cluster.local` -p 6379 PING;
redis-cli --cluster create \
                  `dig +short redis-cluster-0.redis-cluster.redis.svc.cluster.local`:6379 \
                  `dig +short redis-cluster-1.redis-cluster.redis.svc.cluster.local`:6379 \
                  `dig +short redis-cluster-2.redis-cluster.redis.svc.cluster.local`:6379 --cluster-yes; \
redis-cli -h `dig +short redis-cluster-0.redis-cluster.redis.svc.cluster.local` -p 6379 --cluster add-node \
                  `dig +short redis-cluster-3.redis-cluster.redis.svc.cluster.local`:6379 \
                  `dig +short redis-cluster-0.redis-cluster.redis.svc.cluster.local`:6379 --cluster-slave; \
redis-cli -h `dig +short redis-cluster-0.redis-cluster.redis.svc.cluster.local` -p 6379 --cluster add-node \
                  `dig +short redis-cluster-4.redis-cluster.redis.svc.cluster.local`:6379 \
                  `dig +short redis-cluster-1.redis-cluster.redis.svc.cluster.local`:6379 --cluster-slave; \
redis-cli -h `dig +short redis-cluster-0.redis-cluster.redis.svc.cluster.local` -p 6379 --cluster add-node \
                  `dig +short redis-cluster-5.redis-cluster.redis.svc.cluster.local`:6379 \
                  `dig +short redis-cluster-2.redis-cluster.redis.svc.cluster.local`:6379 --cluster-slave;

Thank you very much

Hey,

Sorry for the delayed reply.

Have you checked that the cluster is running as it ought? i.e write a key to a master, and see it replicated in the slaves?

You can also restart the Gateway, while keeping the Redis cluster running.
If all services are started together (Tyk and Redis cluster) the cluster can take longer than Tyk to go up. Meaning that gateway is looking for a service that doesn’t yet exist. Restarting the Tyk Gateway would mean they’re searching for something that exists and might work.

Can you test if the same issue happens when using Docker?

Thank you guys for all your help

Let’s see…

Have you checked that the cluster is running as it ought? i.e write a key to a master, and see it replicated in the slaves?

Seems OK, if I am not missing something:

[email protected]:/data# redis-cli -h redis-cluster-0.redis-cluster.redis.svc.cluster.local -c
redis-cluster-0.redis-cluster.redis.svc.cluster.local:6379> set this_is_a_key "this is a value"
-> Redirected to slot [15834] located at 10.244.3.139:6379
OK
10.244.3.139:6379>

I see this log in redis-cluster-2 (10.244.3.139):

116:C 26 Sep 2022 06:33:17.808 * DB saved on disk
116:C 26 Sep 2022 06:33:17.809 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
13:M 26 Sep 2022 06:33:17.902 * Background saving terminated with success

Log from redis-cluster-5 (its slave):

13:S 26 Sep 2022 06:33:17.748 * 1 changes in 3600 seconds. Saving...
13:S 26 Sep 2022 06:33:17.749 * Background saving started by pid 81
81:C 26 Sep 2022 06:33:17.757 * DB saved on disk
81:C 26 Sep 2022 06:33:17.759 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
13:S 26 Sep 2022 06:33:17.851 * Background saving terminated with success

You can also restart the Gateway, while keeping the Redis cluster running.
If all services are started together (Tyk and Redis cluster) the cluster can take longer than Tyk to go up. Meaning that gateway is looking for a service that doesn’t yet exist. Restarting the Tyk Gateway would mean they’re searching for something that exists and might work.

I am doing this way all the time

Can you test if the same issue happens when using Docker?

I need some investigation on this, I don’t know how to do it right now

Hi @Gabriel_Pu,

If you’re open to using Docker, please check this repo. My colleague created it very recently. It is a stripped-down version of our official Tyk Pro Docker Demo repo, modified to work with a Redis Cluster

You can remove any services you won’t use, like the Dashboard, pump, mongo etc from the docker-compose.

Below is some other info that would be helpful:


In the first step, you’ll need to add 6 Redis nodes in your cluster as individual services that use the Bitnami Redis Cluster images (Im using this because it does all the ‘grunt’ work in the invidual Redis configuration part for us). At the time of writing this, Redis 6 is the latest that is supported by Tyk.

In the file we also initiate the master node first (denoted by 'REDIS_CLUSTER_CREATOR=yes') as you’ll get soft errored out if it’s not in this sequence. We’re also exposing port 6380 in this step so when we have our cluster up and running, we can easily test it in the terminal.

Also really important that each node belongs in to the Tyk network otherwise you’ll get no such host errors in your logs.

Each node also needs to be aware of the other nodes in the system, so we have to add all of the nodes in the environments. In the environment variables, you can also add passwords, empty passwords (as above), or nothing in here at all since for Bitnami images the default password is bitnami. For testing purposes, it’s easier without a password.

Troubleshooting:

Some useful troubleshooting information:

  1. Each Redis node has to be in the same network, ie tyk
  2. Master Redis must be up first before replicas
  3. Need minimum of 6 nodes in the Cluster (3:3)
  4. You can get a racing conditions when you’re standing everything up, i.e Tyk goes up before Redis which will collapse Tyk/ throw errors. It’s best if you up the Redis Cluster before Tyk, this way when Tyk goes up it can just connect to Redis. Assuming you have everything in one Docker-compose.yml file, then you can just restart your Dashboard later. After restarting your Dashboard, everything should be working fine.
  5. All nodes exist in the default port (as opposed to each node being in their own port)
  6. Make sure the Redis version is a version that’s supported by Tyk (otherwise you’ll end up with a lot of unknown errors)
  7. Connect directly to your master node and trying to set a random key value pair, if you get an error then there’s something wrong with your Redis set up. You can connect by either using the Docker exec command or by redis-cli -h localhost -p 6380 . This is why we exposed 6380 earlier in the docker compose file. Once you’re connected you can run the following commands:
  8. keys * - this will list all the keys, if set up properly you should see some Tyk keys.
  9. set a 1 - if there’s something wrong with your Redis nodes, then you’ll see an error similar to this: (error) MOVED 15495 192.178.300.10:6379

Let us know how you get on.

1 Like

One question:

  1. Connect directly to your master node and trying to set a random key value pair, if you get an error then there’s something wrong with your Redis set up. You can connect by either using the Docker exec command or by redis-cli -h localhost -p 6380 . This is why we exposed 6380 earlier in the docker compose file. Once you’re connected you can run the following commands:
  2. keys * - this will list all the keys, if set up properly you should see some Tyk keys.
  3. set a 1 - if there’s something wrong with your Redis nodes, then you’ll see an error similar to this: (error) MOVED 15495 192.178.300.10:6379

Should’t it be like this (note -c):

redis-cli -h localhost -p 6380 -c

as redis is in cluster mode.
I am saying that because I received same kind of error (error) MOVED 15495 192.178.300.10:6379, trying to access without -c (cluster), but redis worked fine with that option as shown on my last post.

Thank you again

1 Like

Hey @Gabriel_Pu,

Yes you’re right. It’s an error, thank you.

Hey Gabriel,

Thanks so much for pointing out the -c option. You’re absolutely correct that it’s needed in order for RedisCLI to know we’re using a cluster as opposed to a standalone instance.

Can the pods reach one another, i.e curl{gateway_url}:{port}/hello and/or list keys (keys *) after adding a random key? This will give us whether the pods can communicate with each other or not. This will also help us understand if your Redis connection is storing proper values. I noticed your earlier comment about different/ same namespaces but I just want to reconfirm that above again ^.

  • Please post the keys * from the master.
  • Also what is returned from your {gateway_url}:{port}/hello endpoint?

I also notice from the gateway you provided that you have addrs inside storage and then you have redis_addrs. Can you confirm you deleted the redis_addrs portion entirely?

Alternatively, i’d also recommend isolating the problem a bit further. At the current moment, we’re not entirely sure whether it’s actually the gateway or Redis that’s the issue (obviously the connection isn’t being made but isolating it would help us get closer to the solution). One way of doing this is by scraping away the cluster entirely and trying it with a standalone Redis instance (this takes away a bit of complications away). Assuming this works properly then our takeaway is that the cluster was incorrectly configured. If the connection is still not made properly, then the takeaway is that Tyk is at fault.

Additionally are you able to share a public repo of all the files you have. Alongside that can you share the steps (commands included) that you’ve taken so we can reproduce what you’re seeing in a 1:1 environment please! The more detail/steps shared will help us produce what you’re seeing faster which in return will help us find a solution faster as well.

We’ll get to the bottom of this !!

Valmir

Before you attempt the above ^^^

Can you also set this to true tyk-helm-chart/values.yaml at master · TykTechnologies/tyk-helm-chart · GitHub.