Performance Bottleneck with gRPC Co-Processor Middleware in Tyk OSS Gateway

david.zemen · December 3, 2024, 2:24pm

Hi Tyk Community!

We’re running a Tyk OSS Gateway with a gRPC co-processor middleware, both deployed in a single Kubernetes pod as separate containers. During performance testing, we’ve observed a significant performance drop when co-processing is enabled.

Here are the details:

Without co-processing: One Tyk instance can handle ~30k RPS.
With co-processing enabled: The performance caps at ~10k RPS.

Interestingly, even with significantly oversized resources, the performance consistently stagnates at around 10k RPS. The resource usage of the gRPC server doesn’t increase proportionally with the load.

For reference:

The gRPC server is implemented in Java.
Middleware logic is minimal, only adding a header (no heavy computation or I/O).
OpenTelemetry monitoring shows:
A healthy flow under low to moderate load vs. degradation at higher loads:

image1920×1648 128 KB

We’ve tried the following optimizations:

Configuring larger or dynamically sized thread pools in the gRPC server.
Increasing HTTP connection limits on the gRPC server side.

Unfortunately, none of these efforts resolved the bottleneck. Given the low resource usage on the gRPC side, we suspect the issue lies within Tyk itself or the interaction between Tyk and the gRPC co-processor.

Has anyone experienced similar performance issues or have suggestions on where to investigate further? Could there be a bottleneck in Tyk’s handling of gRPC calls under high loads?

We’d appreciate any insights or guidance. Thank you!

Olu · December 6, 2024, 10:11am

gRPC may suffer from performance overhead on the network since Tyk sends the entire request details to the gRPC server and expects the same in return. How much depends on external factors like network conditions, possibly even in LAN.

None that I am aware of. What’s your version of the gateway?

Maybe on the network I/Os. The longest duration seem to be in establishing connection to the gRPC middleware and actioning the response gotten back