Results from load test

Imported Google Group message. Original thread at: Redirecting to Google Groups Import Date: 2016-01-19 21:04:12 +0000.
Sender:Christian Amann.
Date:Tuesday, 20 January 2015 16:01:16 UTC.

Hi everyone,

I thought that this might be interesting: I conducted a load test for Tyk using Gatling (http://gatling.io/).

Setup:

  • Virtual machine: 1 Core (2 virtual cores with hyperthreading), 2.4 Ghz Intel I7, 1GB RAM, SSD
  • Upstream: Nginx delivering static page. Up to 1000 req/s average request duration <3ms => Nginx does not increase latency, we are actually measuring latency generated by Tyk.
  • Redis and MongoDB deployed in same VM. CPU load on them is moderate (max 30%).

Results

  • Tyk handles 400 req/s with avg. 150ms/req handling time. At that point CPU load is at 100%
  • Tyk handles 300 req/s with avg. 90ms/req
  • Tyk handles 200 req/s with avg. 5ms/req. CPU at 50%

The detailed gatling results can be found at http://s000.tinyupload.com/?file_id=82152762366877490894 . They contain the load test for nginx alone (loadtestnginx-) and for Tyk with upstream nginx (loadtestproxy-). The NNNu means how many threads are generating requests in parallel. This does not necessarily mean that there are N requests/sec (only if response times are low, e.g. at 200 req/s).
The gatling scripts I used can be downloaded from http://s000.tinyupload.com/?file_id=50833794296332799765 .

I guess my methodics are not perfect and Redis is not tuned but I guess this gives a hint at how well Tyk performs.

Cheers,
Chris

Imported Google Group message.
Sender:Martin Buhr.
Date:Tuesday, 20 January 2015 16:32:18 UTC.

Hi Chris,

Thanks for this :slight_smile: We did some load testing ourselves using load impact on a 2GB / 2 Core Digital Ocean VM, to test our new JavaScript engine middleware. we tested 1000 concurrent users for 3 minutes, and baselined NginX pushing a single page (very similar to your test).
Baseline NginX was about ~21ms response time
Tyk with no JS middleware and local Redis: ~25ms
Tyk with JS Middleware: ~27ms
Here’s the performance graph for the soak test:

Performance is the blue line, it holds steady once the ramp-up is completed. This is running one node, CPU utilisation never got near 100% though… We run these periodically, especially with larger new features.

Cheers,
Martin

Imported Google Group message.
Sender:Christian Amann.
Date:Tuesday, 20 January 2015 17:07:56 UTC.

I obviously didn’t configure something correctly in this case since my performance is much worse than yours. How many requests per second do you have? (may not be the number of users unless every user executes exactly one request per second)

Imported Google Group message.
Sender:Martin Buhr.
Date:Tuesday, 20 January 2015 17:29:30 UTC.

Hi Chris,

Ah, I just re-ran the test and it’s generating far lower r/ps than expected, it’s maxing out at 30-40 rps, which explains the limp load. Might need to switch providers here and try again.

Is your test system a VM running locally?

Thanks,
Martin

Imported Google Group message.
Sender:Christian Amann.
Date:Tuesday, 20 January 2015 17:45:05 UTC.

Yes, it’s a VM with Tyk, Redis and MongoDB. The load testing tool runs from outside the VM but both are on the same physical hardware.

Imported Google Group message.
Sender:Martin Buhr.
Date:Wednesday, 21 January 2015 10:52:28 UTC.

Hi Chris,

We just ran a more thorough load test with JMeter against our main test machine, I’ve put the results below - the set up of the test is the same as yours, the test was run from an 8GB MBP Core i7, which was benchmarked to make sure it had capacity to run the tests, each test ran for 3 minutes with a 1 minute ramp-up. The raw results are below, I’ve also put in a brief summary for those not wishing to trawl through and compare figures.

Summary of results

NginX performance hums along (similar in your test) at an average of around ~80ms stable throughout all three scenarios, CPU usage never gets above 10% (average across both). So this is our baseline.

With Tyk, we saw it add 10ms to the request time at 200RPM, add 49ms at 400rps and add it became unstable at 800rps with response times going up to more than 2 seconds.

We noticed that the application was only using 1 CPU, with a Go app, if the task is CPU bound, as a web server is, it is possible to make an application use more by setting the GOMAXPROCS environment variable, which we did, and re-ran the 400rps and 800rps tests.

With GOMAXPROCS set, Tyk added 12ms at 400rps and 421ms at 800rps, CPU load at the highest level was now maxing out at 94% average across both cores.

This indicates that more cores do improve performance, but only with GOMAXPROCS set properly, and that Tyk performs pretty well at high loads, even on an untuned single-server.

We have, in separate tests, noticed that if the Redis server is far away, i.e. not in the same network or not physically close to the main machines, can introduce the largest increase in response time impact, naturally a local Redis server is not a likely production scenario, but it’s worth keeping in mind for anyone implementing Tyk.

Test Setup and Results

Target machine:
Digital Ocean VM
2 CPU’s
2GB RAM
Ubuntu 64 bit
The software setup:
Tyk running:
Analytics data collection enabled
No additional middleware, standard token-based Auth
Version 1.3
Tyk analytics (v0.9) running vanilla configuration
Redis installed locally
ulimit set to 99999 as otherwise it runs out of file descriptors, this is the only tuning we did
Mongo installed locally (latest version 2.6.6)
NginX running locally, serving a single JSON file from root
The test setup:
Two test groups, one testing NGinX on it’s own, the other testing Tyk (port 80 vs. port 5000)
Three scenarios: 200 rps, 400 rps and 800 rps for both groups
Two additional scenarios for the Tyk group: setting GOMAXPROCS value to the number of CPU’s for 400rps and 800rps
The results:

Test Group 1: NGinX Baseline

200 RPS (12,000 rpm):
Throughput: ~200 rps
Average: 82ms
Median: 77ms
CPU AVG: 4% (global system utilisation)

400 RPS (24,000 rpm):
Throughput: ~404 rps
Average: 79ms
Median: 77ms
CPU AVG: 6-7% (global system utilisation)

800 RPS (48,000 rpm):
Throughput: ~ 810rps
Average: 81ms
Median: 77ms
CPU AVG: 10% (global system utilisation)

Test Group 2: Tyk Simple Auth - 1 Key

200 RPS (12,000 rpm):
Throughput: ~209 rps
Average: 92ms
Median: 80ms
CPU AVG: 30-40% (global system utilisation) - mellowed to 12%

400 RPS (24,000 rpm):
Throughput: ~ 412rps
Average: 128ms
Median: 88ms
CPU AVG: 50-60% (global system utilisation)

400 RPS (24,000 rpm) with GOMAXPROCS set to num CPUs (2):
Throughput: ~ 420rps
Average: 91ms
Median: 81ms
CPU AVG: 60-66% (global system utilisation)

800 RPS (48,000 rpm):
Throughput: ~ 668rps
Average: 2949ms (high error rate, above 1%)
Median: 3278ms
CPU AVG: 76% (global system utilisation) (CPU1 MAXED)

800 RPS (48,000 rpm) Test 2 - GOMAXPROCS set to num CPUs (2):
Throughput: ~ 795rps
Average: 502ms (high error rate, above 1%)
Median: 401ms
CPU AVG: 80-100% (global system utilisation) (CPU1 MAXED)

Thanks for taking the time to test it out, we really want to make sure Tyk is performant and having the community contribute is so valuable :slight_smile: Really appreciate it.

Cheers,
Martin

Imported Google Group message.
Sender:Martin Buhr.
Date:Tuesday, 27 January 2015 13:23:34 UTC.

Just a quick not for those interested - we tried this on a more beefy server machine:

4 Core
8GB RAM
UK-based server (we are based in London)

Running an optimised version of Tyk (a small tweak, which we will commit to the master branch soon), with GOMAXPROCS set to 4, we saw sub-5ms latency added by Tyk, and never more than 50% CPU usage:

NGINX:
Throughput: ~430 rps
Avg: 8ms
Median: 7ms

Tyk (1.4):
Throughput: ~430 rps
Avg: 15ms
Median: 10ms

Tyk (1.4 Optimised):
Throughput: 430 rps
Avg: 12ms
Median: 9ms

Interestingly, we also found that on the more constrained 2GB/2 Core server, the optimised version suffered a little and performed worse. We will most likely need to make the optimisation configurable.

Cheers,
Martin

Imported Google Group message.
Sender:Christian Amann.
Date:Tuesday, 27 January 2015 15:29:59 UTC.

Hi everyone,

I just profiled Tyk a bit. For profiling I used the go profiler version 1.1.1 . Sadly in version 1.2.1 the profiler is broken (see linux - Golang: What is etext? - Stack Overflow ) but installing 1.1.1 is simple. Just use GVM: GitHub - moovweb/gvm: Go Version Manager

sudo apt-get install -y golang mercurial git bzr graphviz ghostscript # Install dependencies
bash < <(curl -s -S -L https://raw.githubusercontent.com/moovweb/gvm/master/binscripts/gvm-installer)
gvm install go1.1.1
gvm use go1.1.1

Afterwards log in again and run “go version”. It should yield “go version go1.1.1 linux/amd64” or similar. Then install all dependencies of Tyk:

go get Google Code Archive - Long-term storage for Google Code Project Hosting.
go get GitHub - RangelReale/osin: Golang OAuth2 server library
go get GitHub - sirupsen/logrus: Structured, pluggable logging for Go.
go get GitHub - docopt/docopt.go: A command-line arguments parser that will make you smile.
go get github.com/garyburd/redigo/redis
go get GitHub - gorilla/context: A golang registry for global request variables.
go get GitHub - justinas/alice: Painless middleware chaining for Go
go get GitHub - mitchellh/mapstructure: Go library for decoding generic map values into native Go structures and vice versa.
go get GitHub - nu7hatch/gouuid: Go binding for libuuid
go get GitHub - rcrowley/goagain: Zero-downtime restarts in Go
go get GitHub - robertkrimen/otto: A JavaScript interpreter in Go (golang)
go get github.com/robertkrimen/otto/underscore
go get gopkg.in/vmihailenco/msgpack.v2
go get labix.org/v2/mgo
go get labix.org/v2/mgo/bson
go get GitHub - franela/goreq: Minimal and simple request library for Go language

Now download Tyk and Tykcommon to GVM dir:

mkdir -p ~/.gvm/pkgsets/go1.1.1/global/src/github.com/lonelycode/tyk/
git clone GitHub - TykTechnologies/tyk: Tyk Open Source API Gateway written in Go, supporting REST, GraphQL, TCP and gRPC protocols ~/.gvm/pkgsets/go1.1.1/global/src/github.com/lonelycode/tyk/
mkdir -p ~/.gvm/pkgsets/go1.1.1/global/src/github.com/lonelycode/tykcommon/
git clone GitHub - TykTechnologies/tykcommon: Tyk common objects [DEPRECATED] ~/.gvm/pkgsets/go1.1.1/global/src/github.com/lonelycode/tykcommon/

Build Tykcommon:

go build GitHub - TykTechnologies/tykcommon: Tyk common objects [DEPRECATED]

And build Tyk (using the modified main.go file that is attached. It enables CPU profiling):

cd ~/.gvm/pkgsets/go1.1.1/global/src/github.com/lonelycode/tyk/
make build/tyk

Now go to build dir and run Tyk with profiling:

cd build
./tyk --cpuprofile

Generate some load on Tyk, otherwise you’re not gonna measure anything, then kill tyk (Ctrl-C). The file tyk.prof has been generated. Generate a profiling graph using the go profiling tool (graphviz and ghostscript have to be installed):

go tool pprof tyk tyk.prof -pdf > graph.pdf

I attached a graph that I generated for the current master branch of Tyk.
I just thought that sharing this information might be useful. Enjoy :slight_smile:

@Martin: It would be good to put the information on how to build Tyk on the Github page. The cpuprofile flag should also be added to the Tyk master. The memprofile flag is currently not doing anything except for creating the output file. Should also be activated with pprof.WriteHeapProfile(f).

PS: The main thing that I got from this is that Tyk spends a lot of time in “redis.Dial” which is unexpected since I would expect the connection pool to not redial that often.
PPS: A large disadvantage of using the go profiler is that it doesn’t track CPU time but absolute time (including time when the process is in blocked state).

  • show quoted text -

Imported Google Group message.
Sender:Martin Buhr.
Date:Tuesday, 27 January 2015 15:39:26 UTC.

Hi Christian,

Lovely - thanks for this :slight_smile:

Getting the depencies for Tyk is actually super simple if you have a golang workspace set up. Simply “go get GitHub - TykTechnologies/tyk: Tyk Open Source API Gateway written in Go, supporting REST, GraphQL, TCP and gRPC protocols”, then cd into the directory and run “go get ./…” and all dependencies (and their dependencies) will be downloaded and put in the workspace, the project should then compile (this is what our TravisCI script does) :slight_smile:

I’ll add it to the readme in the repo though.

Interesting note about the dialler, I’ll take a look as it may be another spot that more efficiency can be squeezed out.

Cheers,
Martin

  • show quoted text -

Imported Google Group message.
Sender:Martin Buhr.
Date:Tuesday, 27 January 2015 16:24:50 UTC.

You won’t believe this - but I think the problem was the maxIdle pool size, it was set to 3, which is a pittance, must have overlooked it during the initial implementation. making it configurable and setting it to 30 by default as well as moving session writes to a goroutine (no longer needs to be locked with a Redis counter), brings averages down:

On our 2GB/2 Core box:

Test 1:
Baseline: 420 rps: Average: 18ms, Median: 8ms
Tyk (master): 400rps: Average: 25ms, Median: 9
Avg latency increase: 7ms

Test 2
Baseline: 420 rps: Average: 9ms, Median: 7ms
Tyk (master): 420rps: Average: 17ms, Median: 9ms
Avg latency increase: 8ms

Compared to the last set of results with 1.4 on the same box at 12ms we’ve shaved some time off and are below 10ms on some pretty heavy traffic.

At 800rps we’re seeing much lower CPU load and approx ~26ms added latency. Not so good, but again a far improvement on 421ms!

Would be awesome if you could run the profiler again and post the results to see if the amount of time spent in Dial() has dropped?

Cheers,
Martin

Imported Google Group message.
Sender:Christian Amann.
Date:Tuesday, 27 January 2015 17:50:01 UTC.

Actually I think since we want to operate Tyk at high loads maxIdle should be set to 100 by default since many will not bother configuring it.
Running with profiling is pretty simple actually (as described above), and I would really suggest adding the cpuprofile flag (an implementing the memprofile flag).

I attached the result from my last run with the current version (200 Req/sec). The calls to Dial() have disappeared completely from the profile :slight_smile: . This doesn’t mean that it hasn’t been called, just that the time being spent there is marginal.
The next hotspot is garbage collection in runtime.mallocgc. You can tune GC, see http://golang.org/pkg/runtime/. I compared GOGC=1000 and GOGC=100 (default). The attached results show that much less time is spent in GC because it runs more rarely. I think tuning this needs to be done carefully. A better approach would be to actually reduce the number of allocations wherever possible.
Just have a look at the profile, it’s quite revealing.

  • show quoted text -

Imported Google Group message.
Sender:Martin Buhr.
Date:Tuesday, 27 January 2015 18:17:16 UTC.

Hi Christian,

Will do - we only did memory profiling initially to check for memory leaks.

Will leave GC tubing to integrators as that’s probably more effective.

Thanks again for all the testing :slight_smile:

Cheers,
Martin

  • show quoted text -

  • show quoted text -


You received this message because you are subscribed to the Google Groups “Tyk Community Support” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To view this discussion on the web, visit https://groups.google.com/d/msgid/tyk-community-support/15d0eaf6-4f94-40c2-af73-0725d3908f61%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<graph.pdf>