Host manager dies with "Init update channel failed" error

MrImport · January 19, 2016, 9:08pm

Imported Google Group message. Original thread at: Redirecting to Google Groups Import Date: 2016-01-19 21:08:43 +0000.
Sender:Tomislav Pasalic.
Date:Tuesday, 24 March 2015 19:43:07 UTC.

Hi,

I am having issues with tyk-host-manager. Initially it works correctly. It receives notifications from Dashboard that an API has changed and it successfully reloads the api definition in local tyk instance.

However exactly after 60 seconds host manager dies and I see following in its log file:
time=“2015-03-24T19:24:07Z” level=error msg=“Init update channel failed”
time=“2015-03-24T19:24:07Z” level=error msg=EOF

I don’t get any exceptions in any of the logs (tyk, tyk-analytics or even tyk-host-manager), I just get this error logged.

I am running tyk on Centos 7.0. I have 2 tyk instances on separate machines. They are connected to the clustered Redis instance (via haproxy) and they share a non-clustered Mongo instance for now. At the moment I don’t have nginx configured yet.

When host managers are up I am successfully reloading API definitions on both tyk nodes so I think my setup is ok but it is very suspicious that host managers die exactly after 60 seconds of not being contacted so I may have misconfigured something.

I am attaching main config files and log from the tyk-host-manager. I would appreciate any help I can get on this.

Thanks,
Tomislav.

MrImport · January 19, 2016, 9:08pm

Imported Google Group message.
Sender:Martin Buhr.
Date:Tuesday, 24 March 2015 20:34:07 UTC.

Hi,

I took another look at the code and it looks like redis is sending an EOF down the pub sub channel, the host manager just bubbles error cases up which is what you are seeing.

The Init message is Tyk saying the connection failed and EOF is what the redis driver is saying it received, looks like redis dropped the connection after 60s and Tyk isn’t renewing (something we need to fix in code btw - would be awesome if you could raise a ticket).

However we’ve not seen this anywhere else so it may be specific to your redis configuration.

What version of redis are you running?
Thanks,
Martin

MrImport · January 19, 2016, 9:08pm

Imported Google Group message.
Sender:Tomislav Pasalic.
Date:Wednesday, 25 March 2015 10:58:56 UTC.

Hi Martin,

Redis version is 2.8.19 but I think the issue is not with Redis but rather with haproxy.

When I point tyk host manager to the redis master node directly (gateway and dashboard remain connected via haproxy) I don’t have this issue any more so it has to me something with my haproxy configuration.

Still it is annoying that both tyk gateway and dashboard work perfectly fine through haproxy. I am not really expert in haproxy settings so I will have to investigate a bit more.

Just for the reference I will post my haproxy config file:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
stats socket /tmp/haproxy

defaults
mode tcp
timeout client 30s
timeout connect 1s
timeout server 30s
option tcpka
option clitcpka

start cluster master_6379

frontend ft_master_6379
bind *:7000
default_backend bk_master_6379

backend bk_master_6379
server R_master_6379_1 10.0.0.71:6379 maxconn 1024 check inter 100ms on-marked-down shutdown-sessions

end cluster master_6379

I have tried changing client and server timeouts but still I have issue exactly 60 seconds after last usage of Redis from the host manager.

Regards,
Tomislav.

show quoted text -

MrImport · January 19, 2016, 9:08pm

Imported Google Group message.
Sender:Martin Buhr.
Date:Wednesday, 25 March 2015 11:38:33 UTC.

Hi Tomislav,

How odd - to be honest I’m not that familiar with HaProxy either, I assume you are having redis load balanced behind haproxy in case the master node goes down and then the slave can still push signals. This makes sense for the Tyk gateway nodes, they maintain a pool (and try to reconnect) with a downed Redis host and so they should recover quickly if the redis master goes down.

However, for the host manager, since it isn’t reconnecting at the moment, the pubsub connection would be lost anyway (I believe haproxy would sever the connection on failure and force client reconnect) there’s an option redispatch, but I’m not sure if it would re-create the tcp connection with a slave and renegotiate a pubsub channel on the behalf of the client, so the host manager would fail even if the long-running connection worked.

Now the host_manager is nothing more than a pubsub / REST fanout, it receives the restart signal from Redis and then uses the REST api to enumerate the tyk nodes on the host (so it should be installed alongside your tyk gateway, the gateway will then pull new configurations from your DB, however it’s connection pool should refresh if the connection fails.

That means it would be quite safe to just restart the host_manager process on failure using a script until we’ve got some reconnection code in place.

Some assumptions there about your setup, but it might work. Could you raise an issue in our GH repo to cover host manager reconnects?

Thanks,
Matin

show quoted text -

MrImport · January 19, 2016, 9:08pm

Imported Google Group message.
Sender:Tomislav Pasalic.
Date:Wednesday, 25 March 2015 12:22:14 UTC.

Hi Martin,

I have raised Host manager does not reconnect to Redis · Issue #51 · TykTechnologies/tyk · GitHub for this issue.

Just a remark that I am seeing these disconnects even without Redis doing any failover. Seems like just being connected via haproxy causes the issue.

With regards of my setup, I use HAProxy as a single point of contact for my Redis cluster. I have a separate process that monitors Redis sentinels and when they indicate that new Redis master was elected it updates haproxy config file and instructs haproxy to reload it. When I test the this with actual failover both gateway and dashboard recover nicely.

Thanks,
Tomislav.

show quoted text -