gRPC Load Balancing issue with K8s Headless Service

Branch/Environment/Version

  • Branch/Version: Release 5.8.7
  • Environment: On-prem

Describe the bug
We are currently experiencing an issue where gRPC traffic is not being load-balanced across our backend pods as expected. Despite using a Headless Service, the Tyk Gateway seems to stick to a single backend connection.
It is my understanding that the “smart gRPC Client” in Tyk should resolve the multiple IPs returned by the headless service, establish multiple connections, and round-robin the requests. However, I cannot verify this behavior.

Reproduction steps
Steps to reproduce the behaviour:

  1. Deploy gRPC backend service, running with multiple replicas
  2. Create API, configured to use load-balancing (see below)

Actual behaviour
Only one backend service replica receives traffic.

Expected behaviour
The traffic should be distributed round-robin between the available replicas.

Configuration (tyk config file):
Environment variables (configuration by Tyk erlm chart)

      - env:
        - name: TYK_GW_LISTENPORT
          value: "8080"
        - name: TYK_GW_OAS_VALIDATE_EXAMPLES
          value: "false"
        - name: TYK_GW_OAS_VALIDATE_SCHEMA_DEFAULTS
          value: "false"
        - name: TYK_GW_ENABLEFIXEDWINDOWRATELIMITER
          value: "false"
        - name: TYK_GW_STORAGE_TLSMAXVERSION
        - name: TYK_GW_STORAGE_TLSMINVERSION
        - name: REDIGOCLUSTER_SHARDCOUNT
          value: "128"
        - name: TYK_GW_STORAGE_TYPE
          value: redis
        - name: TYK_GW_STORAGE_ADDRS
          value: master.***:6379
        - name: TYK_GW_STORAGE_ENABLECLUSTER
          value: "false"
        - name: TYK_GW_STORAGE_DATABASE
          value: "0"
        - name: TYK_GW_STORAGE_PASSWORD
          valueFrom:
            secretKeyRef:
              key: redisPass
              name: secrets-trip-api
        - name: TYK_GW_STORAGE_USESSL
          value: "true"
        - name: TYK_GW_SECRET
          valueFrom:
            secretKeyRef:
              key: APISecret
              name: secrets-trip-api
        - name: TYK_GW_NODESECRET
          valueFrom:
            secretKeyRef:
              key: APISecret
              name: secrets-trip-api
        - name: TYK_GW_POLICIES_ALLOWEXPLICITPOLICYID
          value: "true"
        - name: TYK_GW_HTTPSERVEROPTIONS_USESSL
          value: "false"
        - name: TYK_GW_TEMPLATEPATH
          value: /opt/tyk-gateway/templates
        - name: TYK_GW_TYKJSPATH
          value: /opt/tyk-gateway/js/tyk.js
        - name: TYK_GW_MIDDLEWAREPATH
          value: /mnt/tyk-gateway/middleware
        - name: TYK_GW_APPPATH
          value: /mnt/tyk-gateway/apps
        - name: TYK_GW_POLICIES_POLICYPATH
          value: /mnt/tyk-gateway/policies
        - name: TYK_GW_STORAGE_MAXIDLE
          value: "1000"
        - name: TYK_GW_ENABLENONTRANSACTIONALRATELIMITER
          value: "true"
        - name: TYK_GW_POLICIES_POLICYSOURCE
          value: file
        - name: TYK_GW_ENABLEANALYTICS
          value: "true"
        - name: TYK_GW_ANALYTICSCONFIG_TYPE
        - name: TYK_GW_POLICIES_POLICYRECORDNAME
          value: /mnt/tyk-gateway/policies/policies.json
        - name: TYK_GW_HASHKEYS
          value: "true"
        - name: TYK_GW_HASHKEYFUNCTION
          value: murmur128
        - name: TYK_GW_HTTPSERVEROPTIONS_ENABLEWEBSOCKETS
          value: "true"
        - name: TYK_GW_HTTPSERVEROPTIONS_MINVERSION
          value: "771"
        - name: TYK_GW_HTTPSERVEROPTIONS_CERTIFICATES
          value: '[{"cert_file":"/etc/certs/tyk-gateway/tls.crt","domain_name":"*","key_file":"/etc/certs/tyk-gateway/tls.key"}]'
        - name: TYK_GW_HTTPSERVEROPTIONS_SSLINSECURESKIPVERIFY
          value: "false"
        - name: TYK_GW_ALLOWINSECURECONFIGS
          value: "true"
        - name: TYK_GW_COPROCESSOPTIONS_ENABLECOPROCESS
          value: "true"
        - name: TYK_GW_MAXIDLECONNSPERHOST
          value: "500"
        - name: TYK_GW_ENABLECUSTOMDOMAINS
          value: "true"
        - name: TYK_GW_PIDFILELOCATION
          value: /mnt/tyk-gateway/tyk.pid
        - name: TYK_GW_DBAPPCONFOPTIONS_NODEISSEGMENTED
          value: "false"
        - name: TYK_GW_HTTPSERVEROPTIONS_ENABLEHTTP2
          value: "true"
        - name: TYK_GW_HTTPSERVEROPTIONS_FLUSHINTERVAL
          value: "1"
        - name: TYK_GW_PROXYENABLEHTTP2
          value: "true"
        - name: TYK_GW_LOGLEVEL
          value: info
        - name: TYK_GW_HTTPSERVEROPTIONS_READTIMEOUT
          value: "660"
        - name: TYK_GW_HTTPSERVEROPTIONS_WRITETIMEOUT
          value: "660"
        - name: TYK_GW_OAUTHTOKENEXPIREDRETAINPERIOD
          value: "1800"
        - name: TYK_GW_OAUTHTOKENEXPIRE
          value: "1800"
        - name: TYK_GW_GLOBALSESSIONLIFETIME
          value: "3600"

APIDefinition:

apiVersion: [tyk.tyk.io/v1alpha1](http://tyk.tyk.io/v1alpha1)
kind: ApiDefinition
metadata:
  name: grpc-smoketest
  namespace: trip-api-gateway
spec:
  active: true
  api_id: grpc-smoketest
  name: gRPC smoketest
  protocol: http
  proxy:
    enable_load_balancing: false
    listen_path: /moia.apigateway.SmoketestService/
    target_url: h2c://grpc-smoketest-headless.pe-tools.svc.cluster.local:50051
    transport: {}
  version_data:
    default_version: Default
    not_versioned: true
    versions:
      Default:
        name: Default
# ...

Additional context

I found the documentation regarding gRPC Load Balancing, but it doesn’t seem to resolve our specific case with the Headless Service. Is there a specific configuration required to force the Gateway to refresh its connection pool?
https://tyk.io/docs/key-concepts/grpc-proxy#grpc-load-balancing
I also noticed this configuration option: coprocess_options.grpc_round_robin_load_balancing (referenced here). Since we are not explicitly using the Coprocessor for this, is this flag necessary for standard gRPC proxying?
https://tyk.io/docs/tyk-oss-gateway/configuration#coprocess-options-grpc-round-robin-load-balancing
In the API Definition, proxy.enable_load_balancing is set to false. Should this be true even when using a K8s Headless service, or does Tyk handle gRPC balancing differently?
I would be happy to provide further details or logs if needed.
Add any other context about the problem here.

1 Like

Hi @Dirk,

Thank you for your post and your patience.

The shared API definition has not been configured for load balancing. Particularly, proxy.enable_load_balancing (should be true) and proxy.target_list have not been set.
Please see the docs on Load Balancing.

However, it seems you’re looking to use the Service Discovery Feature. Please see the docs on this as well.

Since we are not explicitly using the Coprocessor for this, is this flag necessary for standard gRPC proxying?

No it is not. It applies to gRPC plugins.

Please let us know how you get on, or if you encounter issues configuring this.

Looking forward to your reply.

Hey Ubong,

thanks for the reply. So, probably our approch with a “headless” Kubernetes was not the right way,
but is there any support to dynamically define the target list, based on K8s endpoint lists in a K8s service? We just want to have the gateway know all pods of the backend service, and update them dynamically in case of pod replacements.
The examples for service discovery, that seem to support such dynamic target lists, are all for some specific service discovery services, but K8s does bring this “out of the box”, but does not seem to be supported :thinking:

Hi @Dirk,

Yes correct -Tyk does not natively integrate with Kubernetes Endpoints to dynamically populate proxy.target_list at present. Tyk relies on the Service abstraction rather than directly consuming Endpoint or EndpointSlice resources.

If a headless Service must be retained for other requirements, the recommended approach would be to introduce a standard ClusterIP Service targeting the same Pods and have the Gateway proxy traffic through it.

Alternatively, if dynamic Pod tracking is required at the Gateway layer, this would need to be implemented externally via automation that updates the API definition. But this would introduce operational complexity and would require frequent API definition updates, which may not be ideal.

Hope this helps. Please let us know how you get on.

Best regards