Problems installing Tyk Operator

I’ve successfully installed my Hybrid data plane into my AWS EKS cluster and I’ve followed these instructions to install the Tyk Operator: Installing Tyk Operator. Everything seems to have worked fine according to the instructions.

But when I attempt to apply a proof of concept ApiDefinition that targets http://httpbin.org I get a time out with this error:

Error from server (InternalError): error when creating "tyk/httpbin.yaml": Internal error occurred: failed calling webhook "mapidefinition.kb.io": failed to call webhook: Post "https://tyk-operator-webhook-service.tyk-operator-system.svc:443/mutate-tyk-tyk-io-v1alpha1-apidefinition?timeout=10s": context deadline exceeded

This error is mentioned at the bottom of the Operator instructions and suggests it’s related to the cert manager. I’ve followed the cert manager debugging guide but still haven’t been able to solve this. Any suggestions?

As indicated in troubleshooting Tyk operator, this typically happens when the webhookservie does not have access to the operator manager service. This is typically due to connectivity issues or if the manager is not up.

Could you share the details of the timeout value and the inner errors that may have been concealed (either it is a connectivity issue (SYN is dropped), or it is a webhook issue (i.e., the TLS certificate is wrong, or the webhook is not returning any HTTP response)

If you are on GCP you will need to configure firewall rules to allow for traffic from the webhook to the pods running the the manager.

I’ve been through the debugging guide here and i’ve extended the timeout to 30 seconds by calling:

kubectl patch mutatingwebhookconfigurations,validatingwebhookconfigurations cert-manager-webhook \
  --type=json -p '[{"op": "replace", "path": "/webhooks/0/timeoutSeconds", "value": 30}]'

However, when I apply the following I still get the same error mentioned in my opening post with the 10 second timeout:

kind: ApiDefinition
metadata:
  name: httpbin
spec:
  name: httpbin
  use_keyless: true
  protocol: http
  active: true
  proxy:
    target_url: http://httpbin.org
    listen_path: /httpbin
    strip_listen_path: true

I’m not sure where to look for any inner errors. All the checks suggest that the cert-manager is set up correctly. From inside the K8s network I’m able to curl “https://tyk-operator-webhook-service.tyk-operator-system.svc:443/mutate-tyk-tyk-io-v1alpha1-apidefinition?timeout=10s” successfully (although it complains about an insecure certificate - could this be the issue?).

Please let me know if there’s any other information that I can provide to help debug this.

I’ve increased the timeouts on the Tyk Operator webhooks and now I get this error:

Error from server (InternalError): error when creating "tyk/httpbin.yaml": Internal error occurred: failed calling webhook "mapidefinition.kb.io": failed to call webhook: Post "https://tyk-operator-webhook-service.tyk-operator-system.svc:443/mutate-tyk-tyk-io-v1alpha1-apidefinition?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

The error from cert manager indicates it may be a webhook-side issue. Have you tried isolating the issue as indicated in the screenshot?

Yes, I’ve done that and I can get the happy output. I think the cause is EKS on a custom CNI so I’ve set hostNetwork to true and the securePort to 10260 for the cert-manger pod. I’ve then added new security group rules for that port and I’m still getting the timeout error.

After some playing around I don’t think there’s an issue with the network in Kubernetes. I believe that tyk-operator-mutating-webhook-configuration is able to call https://tyk-operator-webhook-service.tyk-operator-system.svc:443/mutate-tyk-tyk-io-v1alpha1-apidefinition2?timeout=30s the but the API call is timing out.

If I delete the Tyk Operator deployment but keep the CRDs in place I can successfully apply the httpbin example ApiDefinition. If I then install the Tyke Operator again I start seeing these errors in the logs for the operator pod:

{
  "level": "error",
  "ts": 1709891089.4584012,
  "logger": "controller-runtime.manager.controller.apidefinition",
  "msg": "Reconciler error",
  "reconciler group": "tyk.tyk.io",
  "reconciler kind": "ApiDefinition",
  "name": "httpbin",
  "namespace": "default",
  "error": "access denied: You do not have permission to access '/api/apis'. Status: Error HTTP 403: Failed api call",
  "errorVerbose": "Failed api call\naccess denied: You do not have permission to access '/api/apis'. Status: Error HTTP 403\ngithub.com/TykTechnologies/tyk-operator/pkg/client.Call\n\t/workspace/pkg/client/client.go:232\ngithub.com/TykTechnologies/tyk-operator/pkg/client.CallJSON\n\t/workspace/pkg/client/client.go:144\ngithub.com/TykTechnologies/tyk-operator/pkg/client.PostJSON\n\t/workspace/pkg/client/client.go:156\ngithub.com/TykTechnologies/tyk-operator/pkg/client/dashboard.Api.Create\n\t/workspace/pkg/client/dashboard/api.go:34\ngithub.com/TykTechnologies/tyk-operator/pkg/client/klient.Api.Create\n\t/workspace/pkg/client/klient/universal_client.go:49\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).update\n\t/workspace/controllers/apidefinition_controller.go:485\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).Reconcile.func1\n\t/workspace/controllers/apidefinition_controller.go:201\nsigs.k8s.io/controller-runtime/pkg/controller/controllerutil.mutate\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:341\nsigs.k8s.io/controller-runtime/pkg/controller/controllerutil.CreateOrUpdate\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:213\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).Reconcile\n\t/workspace/controllers/apidefinition_controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"
}

Is this error something you can help with? I believe that whatever fixes it could fix my original issue with the web hook.

So I’ve solved this now and it was a networking issue.

In an EKS cluster, the node(s) that the Tyk Operator runs on needs a security group:

  • Protocol: TCP
  • Port range: 9443 , or a range that covers 9443
  • Source: The ID of either the cluster security group, or one of your cluster’s additional security groups. You can find these IDs in the EKS console, under the Networking tab for your EKS cluster.

This security group rule allows the control plane (tyk-operator-mutating-webhook-configuration) to access the node and the downstream tyk-operator-controller-manager over port 9443.

However, I’m still experiencing the error I shared above:

{
  "level": "error",
  "ts": 1709921826.7631285,
  "logger": "controller-runtime.manager.controller.apidefinition",
  "msg": "Reconciler error",
  "reconciler group": "tyk.tyk.io",
  "reconciler kind": "ApiDefinition",
  "name": "httpbin",
  "namespace": "default",
  "error": "access denied: You do not have permission to access '/api/apis'. Status: Error HTTP 403: Failed api call",
  "errorVerbose": "Failed api call\naccess denied: You do not have permission to access '/api/apis'. Status: Error HTTP 403\ngithub.com/TykTechnologies/tyk-operator/pkg/client.Call\n\t/workspace/pkg/client/client.go:232\ngithub.com/TykTechnologies/tyk-operator/pkg/client.CallJSON\n\t/workspace/pkg/client/client.go:144\ngithub.com/TykTechnologies/tyk-operator/pkg/client.PostJSON\n\t/workspace/pkg/client/client.go:156\ngithub.com/TykTechnologies/tyk-operator/pkg/client/dashboard.Api.Create\n\t/workspace/pkg/client/dashboard/api.go:34\ngithub.com/TykTechnologies/tyk-operator/pkg/client/klient.Api.Create\n\t/workspace/pkg/client/klient/universal_client.go:49\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).update\n\t/workspace/controllers/apidefinition_controller.go:485\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).Reconcile.func1\n\t/workspace/controllers/apidefinition_controller.go:201\nsigs.k8s.io/controller-runtime/pkg/controller/controllerutil.mutate\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:341\nsigs.k8s.io/controller-runtime/pkg/controller/controllerutil.CreateOrUpdate\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:213\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).Reconcile\n\t/workspace/controllers/apidefinition_controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650",
  "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"
}

It looks like an authentication issue between the webhook and the Tyke cloud API. It may be because I’ve used the wrong values for my tyk-operator-system secret as documented here: Installing Tyk Operator.

@Olu Are you able to help with this?

1 Like

Great to hear you’ve solved the timeout issue

The error looks like a permission issue with the apis permission object. This would happen if the permissions are set to deny. So yes, the dashboard API credentials needs the APIs permission for operator to manage API definitions