What I’m about to describe sounds vauge but here is what I’m seeing.
After leaving the hybrid gateways up over the weekend where there was no activity, today, when loading new APIs in tyk admin cloud… those apis never make it to the gateways. There is nothing in the logs showing the reception of a new update.
The only way to get the gateway to update is restart the gateway… expected?
That’s interesting - are you actively using gRPC? If you disable coprocess, it may fix the issue (that loop may be piling up - we’ll investigate).
Are you using a custom tyk.conf or are you configuring with environment variables?
We run Hybrid Canary instances to monitor this kind of behaviour, but they do get reloaded every so often in conjunction with test changes in our cloud, so they see regular activity, thus far we have not seen this behaviour.
However there are two things you can do that can resolve this if it happens again:
Restart the gateway (as you’ve done), this isn’t ideal
Hot reload the gateway with an API call, if you use the group reload then the update affects all gateways and will force-pull new configurations, it should also re-establish connectivity with our back-end and reloads from the gateway should start working again.
We have seen this behaviour in the past, but not for a long time - so it is interesting that it has resurfaced. For now, I’d recommend using option 2.
Are the gateways using a shared redis database? In order to synchronise reloads a single gateway gets the notification from our hybrid back-end and will then signal the entire cluster via redis to reload. If they are not on the same redis DB, then this will not happen and only one gateway will reload.
Now that we are running hybird for some time, we are starting to do more config changes, and noticing that changes in the cloud are still not making it to the gateways, we are having to manually restart the containers.
@bitsofinfo Do all the containers not reload or just some?
If only some, are they all connected to the same redis DB? In order for a cloud-triggered reload to affect a cluster, each container must either have a unique group ID (can be set in the tyk.conf slave options), or be connected to the same redis DB as all the others so that one gateway can receive the signal and trigger the reload for the group.
This is more troubling, because it could mean your gateways are not pumping data to us - however if you are having issues with redis, then it could be that the redis DB is purging the data before it can be sent, if you are using the on-container redis DB then it’s quite likely, because it is not tuned in any way, are there any error logs?
We have 2 of them running both connected to same redis cluster (not the local embedded redis in the hybrid default container). We applied several updates yesterday repeatedly and never saw those changes reload on the gateways. Required a restart of the container.
Same w/ the lag or completely lack of stats updates to the cloud, after a restart they seemed to start flowing again.
The info I can provide unfortunately is limited, we were trying to get these API changes configured and didn’t look into it into detail, but what we noticed
a) api updates where not getting to the gateways (we had to restart) I was trying to trap the change events on the gateways by looking for docker logs -tf 65bcf9df43e8 2>&1 | grep Reload and nothing ever happened, nor did the api changes take affect
b) API statistics were hours old, despite requests going through actively
After restarting this all appeared to clear up. No other gateways w/ connected to the cloud were running.
Will monitor this more closely going forward
@Martin - is there anything in the cloud UI or on your end that could show a list of connected gateways, w a unique ID for each that we could trace back to our gateways, would be useful in triaging this kind of thing