Error with middleware in one gateway while other pods are working fine

We are running 3 gateway pods using the same volumes. The middleware path is ‘/mnt/tyk-gateway/middlewares/bundles’. Yesterday after deploying a new bundle of middleware, traffic from 2 pods was perfectly okay while all the incoming traffic to one particular pod started failing due to ‘error=“PyObjectCallObject failed” mw=CoProcessMiddleware’. Its a simple python plugin which is implementing host ip based rate limiting. Since the its an ‘auth_check’ hook, all the traffic from this pod started failing with 403.
There were no changes to plugin at all and the same plugin was working okay with other pods.

I have created a bug report as well, now wondering if I am missing some configuration

Bug report: Error with middleware in one gateway while other pods are working fine · Issue #6003 · TykTechnologies/tyk · GitHub

The worst part about this is that we finally went onto production with tyk just 3 weeks ago after few weeks of testing but had to roll it back because of this. Would appreciate any help with this.

I have created a bug report as well, now wondering if I am missing some configuration

The symptom does appear to be one with configuration. Especially if 2 out of 3 are working as expected.

Can you provide the gateway logs for the working and non-working pods on startup and when the API call is triggered?

You can also share their config/environment variables.

Here are the extra environment variables:

extraEnvs:
    - name: TYK_GW_STORAGE_SSLINSECURESKIPVERIFY
      value: "true"
    - name: TYK_GW_ENABLEHASHEDKEYSLISTING
      value: "true"
    - name: TYK_GW_LOGLEVEL
      value: "info"
    - name: "TYK_GW_CONTROLAPIHOSTNAME"
      value: "<>"
    - name: TYK_GW_ENABLEANALYTICS
      value: "true"
    - name: "VTX_ENVIRONMENT"
      value: "prod"
    - name: TYK_GW_ENABLEBUNDLEDOWNLOADER
      value: "true"
    - name: TYK_GW_BUNDLEBASEURL
      value: "<>"
    - name: "TYK_GW_BUNDLEINSECURESKIPVERIFY"
      value: "true"
    - name: "TYK_GW_COPROCESSOPTIONS_ENABLECOPROCESS"
      value: "true"
    - name: "TYK_GW_COPROCESSOPTIONS_PYTHONPATHPREFIX"
      value: "/opt/tyk-gateway"
    - name: "TYK_GW_HASHKEYS"
      value: "false"
    - name: "TYK_GW_ENABLEREDISROLLINGLIMITER"
      value: "false"
    - name: "TYK_GW_POLICIES_ALLOWEXPLICITPOLICYID"
      value: "true"
    - name: "TYK_GW_ENABLESENTINELRATELIMITER"
      value: "false"

my current analysis is that two pods started downloading the bundle together, and since its the same filesystem, some kind of race condition emerged

Hi,

I think you’re right in this. I’m just guessing but I think there was a race condition with downloading, extracting and loading the script which caused one gateway to end up with file handle to either a partially written file or an empty one.

The solution is to either start one gateway before the others so that it completes the download and extraction so that the shared directory is there and ready when the other gateways start or to give each gateway it’s own dedicated middleware file system.

Cheers,
Pete