Hi,
Here’s what we’d recommend - this is after extensive work in AWS with Tyk:
For Redis, use Amazon ElastiCache replication group with at least two clusters, enable Multi-AZ failover and ensure that you are connecting to the replication group master endpoint, this means failover is automatic to the next nearest replication group and transparent to Tyk’s services. If you set up a good set of replication clusters then you don’t need to worry about persistence since when a node fails the nearest replication group will have near-live data, just don;t reboot them. Combined with the daily snapshots is a really safe way to get going.
(if you want even more control, then create your own CNAME for the redis replication group endpoint, that way if you need to boot a snapshot, you just update the CNAME to point at your snapshot and you can easily switch opver manually without reconfiguring Tyk.)
For MongoDB, you could go and manage this yourself, but I would suggest using the MongoDB MMS service to manage, spin up and deploy your MongoDB replica set, not only will it monitor everything and ensure the servers all work as expected, it manages your topology, backups etc. Tyk’s MongoDB driver works transparently with replica sets, so if you distribute across AZ’s you should be safe - just make sure to size them properly and set up data expiry so you can manage the (fast growing) analytics data - I can give you details on how to set that up.
If you must have independent redis instances for your Tyk Gateway clusters, then MDCB is indeed an option, but I would only say it is an option if your token’s are short lived OR geographically isolated (i.e. tokens from EU go to EU DC and tokens from US East go to US East DC), the way MDCB handles distributed key-caches is that it stores a master copy of a token / session in your main DC / Redis keystore, and then back-fills them in the localised redis db when they first validate against the local gateway, then all write-heavy operations such as rate limiting and quota counting are handled locally for maximum speed.
With MDCB, cached tokens can be updated through a cache invalidation call via MDCB when a change event happens on a token, so you can force a re-copy of the token if you need to change things like ACL or quota limits (and this is near instant across all zones), but they do not synchronise across DCs (for obvious reasons - it’s a fast-moving data structure, keeping it in sync across multiple DCs effectively is complexity we would rather avoid).
The benefit of this is that you don’t need to worry about the ephemeral nature of Redis with MDCB, since keys are copied down, the local redis DB can be restarted / does not need backing up and can be pure in-memory, so long as the redis master DB is built for fault tolerance.
As for Redis Cluster vs ElastiCache vs. MDCB, redis cluster is a beast, and it’s more for ensuring data consistency / persistence against faults, but not for multi-DC or master-master replication, it’s just not designed that way, so a well managed master-slave with failover is preferable (which is why we recommend ElastiCache, it’s managed), but you can do the same with HAProxy, Sentinel and a few redis servers. MDCB is a cure for this, but it also introduces a different set of constraints, so it depends on your token strategy.
Hope that helps!
M.