Tyk architecture in AWS or other cloud environments

jazzhands · June 15, 2016, 7:34pm

We’re looking to put Tyk into production on AWS. We will have app servers in 3 AZs along with a Tyk gateway in each AZ. We will have service discovery/registry monitoring all our instances/services.

We want to simplify Tyk’s dependencies. For example, unless needed, we would prefer to use separate, non-clustered Redis instances per gateway node. I believe we will have a replicated Mongo cluster. I expect that we will have one Tyk Dashboard instance.

What should we consider as we prove this out? What are the pros/cons of using a Redis cluster versus using MDCB (or both)? What can we expect in either scenario if we happen to lose the master Tyk node (or the entire AZ containing the master node)?

Does anyone have any overall suggestions for implementing a simple yet durable Tyk architecture in AWS?

Martin · June 16, 2016, 8:19am

Hi,

Here’s what we’d recommend - this is after extensive work in AWS with Tyk:

For Redis, use Amazon ElastiCache replication group with at least two clusters, enable Multi-AZ failover and ensure that you are connecting to the replication group master endpoint, this means failover is automatic to the next nearest replication group and transparent to Tyk’s services. If you set up a good set of replication clusters then you don’t need to worry about persistence since when a node fails the nearest replication group will have near-live data, just don;t reboot them. Combined with the daily snapshots is a really safe way to get going.

(if you want even more control, then create your own CNAME for the redis replication group endpoint, that way if you need to boot a snapshot, you just update the CNAME to point at your snapshot and you can easily switch opver manually without reconfiguring Tyk.)

For MongoDB, you could go and manage this yourself, but I would suggest using the MongoDB MMS service to manage, spin up and deploy your MongoDB replica set, not only will it monitor everything and ensure the servers all work as expected, it manages your topology, backups etc. Tyk’s MongoDB driver works transparently with replica sets, so if you distribute across AZ’s you should be safe - just make sure to size them properly and set up data expiry so you can manage the (fast growing) analytics data - I can give you details on how to set that up.

If you must have independent redis instances for your Tyk Gateway clusters, then MDCB is indeed an option, but I would only say it is an option if your token’s are short lived OR geographically isolated (i.e. tokens from EU go to EU DC and tokens from US East go to US East DC), the way MDCB handles distributed key-caches is that it stores a master copy of a token / session in your main DC / Redis keystore, and then back-fills them in the localised redis db when they first validate against the local gateway, then all write-heavy operations such as rate limiting and quota counting are handled locally for maximum speed.

With MDCB, cached tokens can be updated through a cache invalidation call via MDCB when a change event happens on a token, so you can force a re-copy of the token if you need to change things like ACL or quota limits (and this is near instant across all zones), but they do not synchronise across DCs (for obvious reasons - it’s a fast-moving data structure, keeping it in sync across multiple DCs effectively is complexity we would rather avoid).

The benefit of this is that you don’t need to worry about the ephemeral nature of Redis with MDCB, since keys are copied down, the local redis DB can be restarted / does not need backing up and can be pure in-memory, so long as the redis master DB is built for fault tolerance.

As for Redis Cluster vs ElastiCache vs. MDCB, redis cluster is a beast, and it’s more for ensuring data consistency / persistence against faults, but not for multi-DC or master-master replication, it’s just not designed that way, so a well managed master-slave with failover is preferable (which is why we recommend ElastiCache, it’s managed), but you can do the same with HAProxy, Sentinel and a few redis servers. MDCB is a cure for this, but it also introduces a different set of constraints, so it depends on your token strategy.

Hope that helps!

M.

reidca · September 26, 2016, 5:51pm

Can you elaborate or provide any links for a recommended setup of a MongoDB cluster in AWS, You mention using the Mongo MMS service which (having Googled!) seems to be a management service for an existing Mongo setup. I am not sure how this works, perhaps it is agent based or does this require the Mongo cluster to be public facing?

We have very little experience in setting up or using Mongo so an idiots guide to recommended architecture (managed service preferred!) in AWS would be very useful.

Thanks
Carl

Martin · September 26, 2016, 7:40pm

Yes the MMS is a management service, and it works with existing or new cloud deployments. The service has (at least last time we used it) the capability to launch and manage a mongodb cluster for you so you didn’t need to configure and launch your own. All that was required was a valid AWS API credential.

reidca · September 27, 2016, 10:02am

It seems that MMS has been deprecated and they now have a cloud product called Atlas. Any links to MMS now redirect to MongoDB Cloud | MongoDB.

Unfortunately it appears this is no longer free.

i did find a useful document here about deploying Mongo on AWS: https://d0.awsstatic.com/whitepapers/AWS_NoSQL_MongoDB.pdf

I would appreciate any guidance on what you would recommend for getting us up and running on Tyk with managed Mongo in a production environment. We do not use Mongo currently so any advice is useful whilst we are looking into this.

Martin · September 27, 2016, 2:47pm

Ah, yes that is what I meant, no it is not free - you would pay per server. The thing with Atlas (Cloud) is:

They deploy the DB for you into your AWS
You get all the metrics
If necessary, you also get all the automatic updates and upgrades

We use Atlas to monitor our cluster.

There is a cloud-formation script that can get you started on AWS with mongo, but it may need modification:

reidca · November 24, 2016, 12:12pm

Thanks for the information contained in this post,it is quite insightful.

Can you please elaborate on a few points for us:

We setup our AWS based Tyk production environment entirely using Cloud Formation, this includes setting up our Elasticache cluster. If we delete our current stack and re-launch it the Elasticache cluster is also destroyed and recreated. This will also of course purge anything stored in it. I had understood that this was not a problem since the information stored was only temporal and would setup again from a persistent storage however after reading this it seems that the data stored in Redis may need to persist. Can you please clarify the use of Redis and whether the data needs to be persistent ?
You mention the need to have an archiving approach for the ever growing analytics data, could you please provide documentation/example scripts for how to purge this data and guidance on when it is appropriate to do so.
Our Tyk configuration files for the dashboard and for the gateway are managed by Puppet. Currently this is setup to use a static configuration file. If Puppet sees run-time changes to the files then it will replace them with the version stored. We have noticed that for example updating the license key using the UI changes the configuration file which Puppet then “corrects”. The license file is no longer a problem since we update that in Puppet as well. My question is this - do either the dashboard or the gateway processes or user interfaces make other run-time changes to the configuration files ? If so we will have to look at a different way of managing the configuration files in Puppet.

Thanks for your help

(Incidentally, it would be great to get this type on information into the documentation. Perhaps in the "moving to production section

Carl

Martin · November 25, 2016, 5:12pm

Tyk stores it’s API tokens and OAuth Clients in Redis, so you must ensure that redis is persisted, or at least in a configuration where it is easy to restore / failover. SO for example with Elasticache, making sure there are many read-replicas and regular snapshots can ensure that your data survives a failure. But re-creating redis from nothing will destroy your tokens, so it’s not advisable to treat it as ephemeral unless you are using Multi Data Center Bridge in which case you only need to ensure your master redis is persisted.

There’s no documentation for this, as it depends on what you are looking to achieve, if you have Tyk Pump set up with the aggregate pump as well as the regular mongo pump, then you can make the tyk_analytics collection in Mongo a capped collection, this guarantees that analytics data is rolling within a size limit.

Tyk and Tyk Analytics will write a default conf to disk if no config file can be found, and the UI will add the license to the config file if you are setting it that way. Those are the only cases in v2.2. In v2.3 it is possible to remotely change a configuration of a gateway via the dashb oard (this can be disabled), but this would be the only other time Tyk writes to it’s conf.

We’re constantly working on our documentation and adding / editing and updating it as feedback comes in - thanks for the suggestion!

Hope that helps.

Cheers,
Martin

david · November 28, 2016, 12:54pm

The next release of our documentation (due very soon) will include information related to these questions.

muthusk · December 1, 2016, 3:35am

@reidca I manage a TYK environment for my Orgn on AWS. EC is clustered and keys hashed. TYK Gateway and TYK Dashboards are separate Elastic Beanstalk Environments. The analytics are pushed to AWS ElasticSearch and CSV archived to s3 ( extended from tyk-pump elasticsearch ).

Interested to know why you are thinking about CFN.

reidca · December 2, 2016, 4:48pm

We use AWS Cloud Formation for all our infrastructure. Personally, I have never been a fan of Elastic Beanstalk and we follow infrastructure as code practices here which means that cloud formation suits us well.

Our Tyk gateways, dashboards and pumps along with their respective ELBs, AutoScaling groups, all IAM policies and security groups etc are all created within a single stack and then Puppet installs and configures Tyk for each of the components.

The Redis Cluster is also built by cloud formation, as is the Mongo Replica Set however these are done in separate stacks.