Tyk cache is mixing up my graphql answers(Security Problem)

Branch/Environment/Version

  • Branch/Version: tyk-gateway:v4.3.2
  • Environment: On-prem

Describe the bug
I have a Graphql API that can receive the same query several times but returns different info based on the received bearer token.
I have configured my API as a GraphQL one, and I’m using Federation, so I have a single entry point in a different subgraph by API.

The problem is that if I perform two requests(almost at the same time) with different bearer tokens in a short period (less than one second), one of them does not reach the API; it is answered by Tyk cache, mixing up the info, showing to one user the other one info(Because of the bearer token) which is a big security problem for us.

I do not have any cache option in my API definition; I added a new one for testing to try to avoid this behavior, and adding the cache option with a header specification does not work either.

Reproduction steps

  1. You need to have a GraphQL API and return some basic info extracted from the Bearer token received in the request.
  2. Set your API JSON with GraphQL using a subgraph.
  3. Perform several requests(almost simultaneously) using the same query but with different bearer tokens.
  4. One of the received answers should be mixed with another.

This is my current API config.

Actual behavior
One of the performed requests received the info related to another request.

Expected behavior
Every request should receive the answer according to its bearer token.

Screenshots/Video

Configuration (tyk config file):

Additional context
I’ve tried using specific cache options like this in my API definition, but it did not work.
“cache_options”: {
“cache_timeout”: 60,
“enable_cache”: true,
“cache_all_safe_requests”: false,
“cache_response_codes”: [200],
“enable_upstream_cache_control”: false,
“cache_control_ttl_header”: “”,
“cache_by_headers”: [“Authorization”]
},
I tried recent versions of Tyk, and I got the same behavior.

I need to know if it is a bug; if not, I need to know how to avoid it using another configuration.

I was going to suggest you try selective caching by header but I see that you may have already tried this.

"cache_options": {
  "cache_timeout": 60,
  "enable_cache”: true,
  "cache_all_safe_requests": false,
  "cache_response_codes": [200],
  "enable_upstream_cache_control": false,
  "cache_control_ttl_header": "",
  "cache_by_headers": ["Authorization"]
}

It might take a while to replicate this. The only thing I can suggest in the meantime is to check the basic_auth object in your API definition. By default, caching is enabled since it’s set to false

"basic_auth": {
      "disable_caching": true,
      "cache_ttl": 0,
      "extract_from_body": false,
      "body_user_regexp": "",
      "body_password_regexp": ""
    }

You might want to disable it and confirm if the same behaviour occurs.

Another one is that your definition seems to have keyless auth set instead of basic_auth. I assume this for the sake of tests.

Hello Olu, Thanks for your answer.

Yes, I already tested the cache options without any different, we had the same behavior.

Following your second suggestion. to set basic_auth, do I need to set use_basic_auth to True? Right now it is False in my config.

Yes, I have a keyless set, because in my prod environment, this API definition has internal True. The only way to access it is through an API federation, and this one uses OpenID.

I set the internal option to false to see if it was a problem with the federation, but it is not, so I’m testing directly this API definition.

I’ll test your second suggestion, and I’ll let you know if it works.

Hello Olu.

I tested this option

"basic_auth": {
      "disable_caching": true,
      "cache_ttl": 0,
      "extract_from_body": false,
      "body_user_regexp": "",
      "body_password_regexp": ""
    }

And I got the same behavior reported in this ticket

That’s weird. It sounds like a bug. Just before we conclude, could you check your upstream for caching?

You can strip out Tyk as the proxy and trigger the same operations. That should be enough for testing.

As if it’s a caching issue from Tyk, then the gateway should return the cache response headers.

X-Tyk-Cached-Response

If you don’t see the response header but still observe the same behavior then can you share your upstream or dummy upstream for replication?

I tried replicating it with a simple HTTP API but could not. I assumed if it’s a Tyk issue then it be an issue with the proxy in general and not specific to GraphQL but replication was unsuccessful.

Hi Olu.

No, I checked that, and I do not see this X-Tyk-Cached-Response in the headers in my response.
I did not say that in the issue description, but I noticed that initially. We do not have that key in the headers, but Tyk was answering from a cache because that request did not reach the API; for example, we saw 3 requests in Tyk’s log but just two in the API’s log the missing one was resolved by a memory cache mixing the information.

We performed some tests to determine that Tyk is the problem. If we achieve this request directly to the API without Tyk, we do not have this problem.

Let me prepare a dummy configuration replicating this error and share with you the API and the folder.

Regards.

Hello Olu.

I have a dummy API to help you to replicate the problem.

Here is a video with the explanation. Video.

Here is a folder with the docker-compose file and the file needed to start the services and replicate the problem. Folder

Let me know if you have any other doubts. We can have a video call if you want.

Regards.

Thanks for sharing the environment. I can replicate the issue

At first, I had doubts since I could see different request IDs and unique response headers but I then checked the analytics record and the latency value was too small.

The least amount of latency for a call to the backend was 44ms. However, I noticed the one with the issue showing 6ms which indicates the request was either lightning-fast or a cache response body was returned.

I am not sure where the issue is but I guess it’s somewhere between the GarphQLmiddleware and the internal proxy request.

I’ll raise an internal ticket for proper investigation

Hello Olu, Thanks for your help.

Regards

Hello Olu.

Do you have anything new about this issue?

Regards.

@Jose_Morales This is still in the backlog and being investigated by the team. The internal ticket number for reference is TT-10962.

The latest info I can observe is that the team is taking a deeper look into how single flight requests are operating underneath. But that’s as far as I know.

I’ll update this thread once there is an update about a timeline for the fix.

Thanks Olu.

Regards.

I have an update on this topic. A fix has been slated for v5.3.1 and LTS version 5.0.12. Kindly lookout for these versions containing the fix soon.

In the meantime, since our gateway is open source, you could build a custom version of the gateway that addresses the fix.

@Jose_Morales Tyk v5.3.1 has just been released. Can you confirm that the issue is resolved?