I’ve successfully installed my Hybrid data plane into my AWS EKS cluster and I’ve followed these instructions to install the Tyk Operator: Installing Tyk Operator. Everything seems to have worked fine according to the instructions.
But when I attempt to apply a proof of concept ApiDefinition that targets http://httpbin.org I get a time out with this error:
Error from server (InternalError): error when creating "tyk/httpbin.yaml": Internal error occurred: failed calling webhook "mapidefinition.kb.io": failed to call webhook: Post "https://tyk-operator-webhook-service.tyk-operator-system.svc:443/mutate-tyk-tyk-io-v1alpha1-apidefinition?timeout=10s": context deadline exceeded
This error is mentioned at the bottom of the Operator instructions and suggests it’s related to the cert manager. I’ve followed the cert manager debugging guide but still haven’t been able to solve this. Any suggestions?
As indicated in troubleshooting Tyk operator, this typically happens when the webhookservie does not have access to the operator manager service. This is typically due to connectivity issues or if the manager is not up.
Could you share the details of the timeout value and the inner errors that may have been concealed (either it is a connectivity issue (SYN is dropped), or it is a webhook issue (i.e., the TLS certificate is wrong, or the webhook is not returning any HTTP response)
If you are on GCP you will need to configure firewall rules to allow for traffic from the webhook to the pods running the the manager.
I’ve increased the timeouts on the Tyk Operator webhooks and now I get this error:
Error from server (InternalError): error when creating "tyk/httpbin.yaml": Internal error occurred: failed calling webhook "mapidefinition.kb.io": failed to call webhook: Post "https://tyk-operator-webhook-service.tyk-operator-system.svc:443/mutate-tyk-tyk-io-v1alpha1-apidefinition?timeout=30s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Yes, I’ve done that and I can get the happy output. I think the cause is EKS on a custom CNI so I’ve set hostNetwork to true and the securePort to 10260 for the cert-manger pod. I’ve then added new security group rules for that port and I’m still getting the timeout error.
After some playing around I don’t think there’s an issue with the network in Kubernetes. I believe that tyk-operator-mutating-webhook-configuration is able to call https://tyk-operator-webhook-service.tyk-operator-system.svc:443/mutate-tyk-tyk-io-v1alpha1-apidefinition2?timeout=30s the but the API call is timing out.
If I delete the Tyk Operator deployment but keep the CRDs in place I can successfully apply the httpbin example ApiDefinition. If I then install the Tyke Operator again I start seeing these errors in the logs for the operator pod:
{
"level": "error",
"ts": 1709891089.4584012,
"logger": "controller-runtime.manager.controller.apidefinition",
"msg": "Reconciler error",
"reconciler group": "tyk.tyk.io",
"reconciler kind": "ApiDefinition",
"name": "httpbin",
"namespace": "default",
"error": "access denied: You do not have permission to access '/api/apis'. Status: Error HTTP 403: Failed api call",
"errorVerbose": "Failed api call\naccess denied: You do not have permission to access '/api/apis'. Status: Error HTTP 403\ngithub.com/TykTechnologies/tyk-operator/pkg/client.Call\n\t/workspace/pkg/client/client.go:232\ngithub.com/TykTechnologies/tyk-operator/pkg/client.CallJSON\n\t/workspace/pkg/client/client.go:144\ngithub.com/TykTechnologies/tyk-operator/pkg/client.PostJSON\n\t/workspace/pkg/client/client.go:156\ngithub.com/TykTechnologies/tyk-operator/pkg/client/dashboard.Api.Create\n\t/workspace/pkg/client/dashboard/api.go:34\ngithub.com/TykTechnologies/tyk-operator/pkg/client/klient.Api.Create\n\t/workspace/pkg/client/klient/universal_client.go:49\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).update\n\t/workspace/controllers/apidefinition_controller.go:485\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).Reconcile.func1\n\t/workspace/controllers/apidefinition_controller.go:201\nsigs.k8s.io/controller-runtime/pkg/controller/controllerutil.mutate\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:341\nsigs.k8s.io/controller-runtime/pkg/controller/controllerutil.CreateOrUpdate\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:213\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).Reconcile\n\t/workspace/controllers/apidefinition_controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"
}
Is this error something you can help with? I believe that whatever fixes it could fix my original issue with the web hook.
So I’ve solved this now and it was a networking issue.
In an EKS cluster, the node(s) that the Tyk Operator runs on needs a security group:
Protocol: TCP
Port range: 9443 , or a range that covers 9443
Source: The ID of either the cluster security group, or one of your cluster’s additional security groups. You can find these IDs in the EKS console, under the Networking tab for your EKS cluster.
This security group rule allows the control plane (tyk-operator-mutating-webhook-configuration) to access the node and the downstream tyk-operator-controller-manager over port 9443.
However, I’m still experiencing the error I shared above:
{
"level": "error",
"ts": 1709921826.7631285,
"logger": "controller-runtime.manager.controller.apidefinition",
"msg": "Reconciler error",
"reconciler group": "tyk.tyk.io",
"reconciler kind": "ApiDefinition",
"name": "httpbin",
"namespace": "default",
"error": "access denied: You do not have permission to access '/api/apis'. Status: Error HTTP 403: Failed api call",
"errorVerbose": "Failed api call\naccess denied: You do not have permission to access '/api/apis'. Status: Error HTTP 403\ngithub.com/TykTechnologies/tyk-operator/pkg/client.Call\n\t/workspace/pkg/client/client.go:232\ngithub.com/TykTechnologies/tyk-operator/pkg/client.CallJSON\n\t/workspace/pkg/client/client.go:144\ngithub.com/TykTechnologies/tyk-operator/pkg/client.PostJSON\n\t/workspace/pkg/client/client.go:156\ngithub.com/TykTechnologies/tyk-operator/pkg/client/dashboard.Api.Create\n\t/workspace/pkg/client/dashboard/api.go:34\ngithub.com/TykTechnologies/tyk-operator/pkg/client/klient.Api.Create\n\t/workspace/pkg/client/klient/universal_client.go:49\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).update\n\t/workspace/controllers/apidefinition_controller.go:485\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).Reconcile.func1\n\t/workspace/controllers/apidefinition_controller.go:201\nsigs.k8s.io/controller-runtime/pkg/controller/controllerutil.mutate\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:341\nsigs.k8s.io/controller-runtime/pkg/controller/controllerutil.CreateOrUpdate\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/controller/controllerutil/controllerutil.go:213\ngithub.com/TykTechnologies/tyk-operator/controllers.(*ApiDefinitionReconciler).Reconcile\n\t/workspace/controllers/apidefinition_controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:214"
}
It looks like an authentication issue between the webhook and the Tyke cloud API. It may be because I’ve used the wrong values for my tyk-operator-system secret as documented here: Installing Tyk Operator.
The error looks like a permission issue with the apis permission object. This would happen if the permissions are set to deny. So yes, the dashboard API credentials needs the APIs permission for operator to manage API definitions