Hybrid: updates in cloud not updating on gateway

bitsofinfo · August 28, 2017, 2:16pm

What I’m about to describe sounds vauge but here is what I’m seeing.

After leaving the hybrid gateways up over the weekend where there was no activity, today, when loading new APIs in tyk admin cloud… those apis never make it to the gateways. There is nothing in the logs showing the reception of a new update.

The only way to get the gateway to update is restart the gateway… expected?

leon · August 28, 2017, 3:06pm

This is indeed not expected, but having a logs, will help a lot.

Leo, Tyk Team

bitsofinfo · August 28, 2017, 3:44pm

Logs don’t show anything over the past few days other than just this over and over

2017-08-28T14:26:28.331394806Z time=“Aug 28 14:26:28” level=error msg=“No gRPC URL is set!”
2017-08-28T14:26:28.331427308Z 2017/08/28 14:26:28 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = “transport: No gRPC URL is set!”; Reconnecting to { }

Martin · August 28, 2017, 8:19pm

Hi,

That’s interesting - are you actively using gRPC? If you disable coprocess, it may fix the issue (that loop may be piling up - we’ll investigate).

Are you using a custom tyk.conf or are you configuring with environment variables?

We run Hybrid Canary instances to monitor this kind of behaviour, but they do get reloaded every so often in conjunction with test changes in our cloud, so they see regular activity, thus far we have not seen this behaviour.

However there are two things you can do that can resolve this if it happens again:

Restart the gateway (as you’ve done), this isn’t ideal
Hot reload the gateway with an API call, if you use the group reload then the update affects all gateways and will force-pull new configurations, it should also re-establish connectivity with our back-end and reloads from the gateway should start working again.

We have seen this behaviour in the past, but not for a long time - so it is interesting that it has resurfaced. For now, I’d recommend using option 2.

Do you have SSL enabled for this gateway?

bitsofinfo · August 28, 2017, 8:22pm

i only listed the grpc log entry as an indication of what I am seeing in logs and no evidence of api updates. reported the grpc error earlier at really an error? "transport: No gRPC URL is set" · Issue #991 · TykTechnologies/tyk · GitHub

yes, SSL to the cloud is enabled.

bitsofinfo · August 29, 2017, 2:59pm

Happening again today, had to restart the gateway, this was in logs

2017-08-29T14:15:48.594951051Z time=“Aug 29 14:15:48” level=warning msg=“[RPC STORE] CheckReload: Not logged in”
2017-08-29T14:15:48.599345904Z time=“Aug 29 14:15:48” level=warning msg=“[RPC STORE] CheckReload: Not logged in”
2017-08-29T14:15:48.605258709Z time=“Aug 29 14:15:48” level=warning msg=“[RPC STORE] CheckReload: Not logged in”

ewah · August 29, 2017, 4:07pm

We are running tyk hybrid w/ two gateways.

Upon hitting “Update” in the API designer, only one (at random) of the gateways is updated w/

time="Aug 29 16:04:31" level=warning msg="[RPC STORE] Received Reload instruction!"
time="Aug 29 16:04:31" level=info msg="Reloaded URL Structure - Success"
time="Aug 29 16:04:31" level=info msg="Reloading endpoints"

but the other is not updated on the other gateway.

Martin · August 29, 2017, 8:01pm

Are the gateways using a shared redis database? In order to synchronise reloads a single gateway gets the notification from our hybrid back-end and will then signal the entire cluster via redis to reload. If they are not on the same redis DB, then this will not happen and only one gateway will reload.

bitsofinfo · August 29, 2017, 8:27pm

Ok then that makes sense, we just have several devs each running their own gateway stack container and they are each on their own redis

Martin · August 29, 2017, 8:32pm

If that’s the case, then each hybrid container needs a unique group ID, you can do this by setting the env variable:

TYK_GW_SLAVEOPTIONS_GROUPID

Or editing the tyk.conf file if you are mounting it into the container and updating

slave_options.group_id

To a unique value for each user, this will shard reload signals per cluster, in this case, per developer.

bitsofinfo · October 6, 2017, 8:24pm

@Martin

Now that we are running hybird for some time, we are starting to do more config changes, and noticing that changes in the cloud are still not making it to the gateways, we are having to manually restart the containers.

bitsofinfo · October 6, 2017, 10:10pm

also non-existant updating of activity by key/api without a restart.

Martin · October 7, 2017, 9:33am

@bitsofinfo Do all the containers not reload or just some?

If only some, are they all connected to the same redis DB? In order for a cloud-triggered reload to affect a cluster, each container must either have a unique group ID (can be set in the tyk.conf slave options), or be connected to the same redis DB as all the others so that one gateway can receive the signal and trigger the reload for the group.

This is more troubling, because it could mean your gateways are not pumping data to us - however if you are having issues with redis, then it could be that the redis DB is purging the data before it can be sent, if you are using the on-container redis DB then it’s quite likely, because it is not tuned in any way, are there any error logs?

bitsofinfo · October 7, 2017, 9:33pm

We have 2 of them running both connected to same redis cluster (not the local embedded redis in the hybrid default container). We applied several updates yesterday repeatedly and never saw those changes reload on the gateways. Required a restart of the container.

Same w/ the lag or completely lack of stats updates to the cloud, after a restart they seemed to start flowing again.

Martin · October 8, 2017, 12:22am

So for an extended period of time there were no analytics at all? Or just for the apis that didn’t update?

Martin · October 8, 2017, 12:39am

Regarding the group updates - are any other user’s running hybrid using the same org credentials Outside of these two gateways?

bitsofinfo · October 9, 2017, 3:14pm

The info I can provide unfortunately is limited, we were trying to get these API changes configured and didn’t look into it into detail, but what we noticed

a) api updates where not getting to the gateways (we had to restart) I was trying to trap the change events on the gateways by looking for docker logs -tf 65bcf9df43e8 2>&1 | grep Reload and nothing ever happened, nor did the api changes take affect

b) API statistics were hours old, despite requests going through actively

After restarting this all appeared to clear up. No other gateways w/ connected to the cloud were running.

Will monitor this more closely going forward

@Martin - is there anything in the cloud UI or on your end that could show a list of connected gateways, w a unique ID for each that we could trace back to our gateways, would be useful in triaging this kind of thing

Martin · October 9, 2017, 10:01pm

Not yet, but it’s something we want to change.