Tyk Gateway error - Failed to decode body: json: cannot unmarshal string into Go value of type []main.DBPolicy

Hi,

I have an XML API that returns URLs with characters like &. I am trying to convert it into JSON using Golang templates. But it always fails with the below error in Gateway service:

ERROR Failed to decode body: json: cannot unmarshal string into Go value of type []main.DBPolicy

Below error in dashboard service (test from Endpoint Designer):

ERROR Error unmarshalling XML: xml.Decoder.Token() - XML syntax error on line 8: invalid character entity &fmt (no semicolon)

When I test removing ‘&’ character, it works perfectly (test in Endpoint Designer). How to workaround this? I need this to be converted to JSON.

API Response:

<?xml version='1.0' encoding='UTF-8'?>
<sitemapindex xmlns='http://www.sitemaps.org/schemas/sitemap/0.9'>
    <listing>
        <merchant>macy</merchant>
        <url>http://upcdeal.us/go.php?url=UPC_039372614225-aHR0cDovL2NsaWNrLmxpbmtzeW5lcmd5LmNvbS9saW5rP2lkPTVSdGxEanYvQkU4Jm9mZmVyaWQ9MjA2OTU5LjM5MzcyNjE0MjI1JnR5cGU9MTUmbXVybD1odHRwOi8vd3d3MS5tYWN5cy5jb20vc2hvcC9wcm9kdWN0L3ZpbmNlLWNhbXV0by1zbGVldmVsZXNzLXN0cmlwZWQtbWF4aS1kcmVzcz9JRD0yNzYwNTQ4JlBhcnRuZXJJRD1MSU5LU0hBUkUmY21fbW1jPUxJTktTSEFSRS1fLTItXy03OS1fLU1QMjc5Kioqc2tkbGpl</url>
        <price>$149.00</price>
        <name> Vince Camuto Sleeveless Striped Maxi Dress</name>
        <imageurl>http://slimages.macys.com/is/image/MCY/products/6/optimized/3599616_fpx.tif?wid=300&fmt=jpeg&qlt=100</imageurl>
        <shipping>$0.00</shipping>
        <condition>New</condition>
    </listing>
</sitemapindex>

Thanks in advance.

Prasad

This error has nothing to do with your XML issue, it is related to your gateway not bootstrapping properly on first start. You may need to turn the gateway service off for 20s and then bring it up again (it’s worth checking your policy connections string too).

This is invalid XML, so it will not work, there’s no real work around except putting your URLs into a CDATA section.

Sorry to hijack this thread but I’m also seeing the error:

ERROR Failed to decode body: json: cannot unmarshal string into Go value of type []main.DBPolicy

This happens whenever the gateway is started and is stopping policies being applied.

There’s also a corresponding error in the tyk-dashboard:

Nonce failed! Node:c091d6a3-19ee-41cf-4c0f-1c2bdc7ca1b2 Nonce was:YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyNDcyN2E2NmZmNTY5NDdjYzRlZjVkNjQ5MTE3ODgwNWQ= Should be: YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyN2M3OTFlMGI1ZGEyNDhiMzZlYTZhMGQ4Zjc1MzI4MzM=

So it appears the issue is with the nonces! There’s nothing special about the gateway configuration, we’re connecting to the dashboard over https so I don’t know if that makes a difference?

Any help you can provide would be great as this is a bit of a showstopper as part of our evaluation (we’re trying to get policies to apply based on JWT tokens - but the debug logs show there are no policies being loaded from the dashboard to use!)

Thanks,
Richard

Here’s the full gateway startup log by the way:

[Aug 25 09:23:22]  INFO dashboard: Node registered id=c091d6a3-19ee-41cf-4c0f-1c2bdc7ca1b2
[Aug 25 09:23:22]  INFO main: Starting heartbeat.
[Aug 25 09:23:22]  INFO main: Detected 2 APIs
[Aug 25 09:23:22]  INFO main: Loading API configurations.
[Aug 25 09:23:22]  INFO main: --> Loading API: Service Information API
[Aug 25 09:23:22]  INFO main: ----> Tracking: (no host)
[Aug 25 09:23:22]  INFO main: ----> Multi target enabled
[Aug 25 09:23:22]  INFO multi-target: ----> Version Default has no override target
[Aug 25 09:23:22]  INFO multi-target: ----> Version testing1 has 'XXXX' for override target
[Aug 25 09:23:22]  INFO main: ----> Checking security policy: JWT
[Aug 25 09:23:22]  INFO main: ----> Setting Listen Path: /service-information-api/
[Aug 25 09:23:22]  INFO main: --> Loading API: Services API
[Aug 25 09:23:22]  INFO main: ----> Tracking: (no host)
[Aug 25 09:23:22]  INFO main: ----> Multi target enabled
[Aug 25 09:23:22]  INFO multi-target: ----> Version testing1 has 'XXXX' for override target
[Aug 25 09:23:22]  INFO multi-target: ----> Version v1 has no override target
[Aug 25 09:23:22]  INFO main: ----> Checking security policy: OpenID
[Aug 25 09:23:22]  INFO main: ----> Setting Listen Path: /services-api/
[Aug 25 09:23:22]  INFO host-check-mgr: Loading uptime tests...
[Aug 25 09:23:22]  INFO main: Initialised API Definitions
[Aug 25 09:23:22] ERROR Failed to decode body: json: cannot unmarshal string into Go value of type []main.DBPolicy
[Aug 25 09:23:22]  INFO main: Gateway started (v2.2.0.4)
[Aug 25 09:23:22]  INFO main: --> Listening on address:
[Aug 25 09:23:22]  INFO main: --> Listening on port: 8080

So it’s correctly retrieving API definitions but failing when retrieving policies.

Okay I’ve been doing some further debugging on this. Here’s the gateway debug log:

[Aug 25 10:24:45] DEBUG Calling: https://XXXX:3000/register/node
[Aug 25 10:24:45]  INFO dashboard: Node registered id=c091d6a3-19ee-41cf-4c0f-1c2bdc7ca1b2
[Aug 25 10:24:45] DEBUG Registration Finished: Nonce Set: YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyZDRiNzA3MGRjOGIyNGE2OTRmYTliNWFlZThjZTliMzA=
[Aug 25 10:24:45]  INFO main: Starting heartbeat.
[Aug 25 10:24:45] DEBUG Calling: https://XXXX:3000/system/apis
[Aug 25 10:24:45] DEBUG Using: NodeID: c091d6a3-19ee-41cf-4c0f-1c2bdc7ca1b2
[Aug 25 10:24:45] DEBUG Calling: https://XXXX:3000/register/ping
[Aug 25 10:24:45] DEBUG Sending Heartbeat as: c091d6a3-19ee-41cf-4c0f-1c2bdc7ca1b2
[Aug 25 10:24:45] DEBUG Hearbeat Finished: Nonce Set: YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyNWE2NDdjYjc0N2M4NDhhYTRkNzVhNzg1ZDVhZWY3Njk=
[Aug 25 10:24:45] DEBUG INITIALISING EVENT HANDLERS
[Aug 25 10:24:45] DEBUG Checking for transform paths...
[Aug 25 10:24:45] DEBUG Checking for transform paths...
[Aug 25 10:24:45] DEBUG Checking for transform paths...
[Aug 25 10:24:45] DEBUG Checking for transform paths...
[Aug 25 10:24:45] DEBUG INITIALISING EVENT HANDLERS
[Aug 25 10:24:45] DEBUG Checking for transform paths...
[Aug 25 10:24:45] DEBUG Checking for transform paths...
[Aug 25 10:24:45] DEBUG Checking for transform paths...
[Aug 25 10:24:45] DEBUG Checking for transform paths...
[Aug 25 10:24:45] DEBUG Loading APIS Finished: Nonce Set: YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyZjc5MDBkNDAxNmY5NDc3MzY3NWRlZGI2MGJhY2QxYjA=
[Aug 25 10:24:45] DEBUG main: Using App Configuration from Dashboard Service
[Aug 25 10:24:45]  INFO main: Detected 2 APIs
[Aug 25 10:24:45]  INFO main: Loading API configurations.
[Aug 25 10:24:45] DEBUG Connecting to redis cluster
[Aug 25 10:24:45] DEBUG Redis pool already INITIALISED
[Aug 25 10:24:45]  INFO main: --> Loading API: Service Information API
[Aug 25 10:24:45]  INFO main: ----> Tracking: (no host)
[Aug 25 10:24:45] DEBUG Storage Engine already initialised...
[Aug 25 10:24:45] DEBUG Redis handles: 1
[Aug 25 10:24:45] DEBUG Storage Engine already initialised...
[Aug 25 10:24:45] DEBUG Redis handles: 1
[Aug 25 10:24:45] DEBUG Connecting to redis cluster
[Aug 25 10:24:45] DEBUG Redis pool already INITIALISED
[Aug 25 10:24:45] DEBUG Connecting to redis cluster
[Aug 25 10:24:45] DEBUG Redis pool already INITIALISED
[Aug 25 10:24:45] DEBUG main: ----> Loading Middleware
[Aug 25 10:24:45]  INFO main: ----> Multi target enabled
[Aug 25 10:24:45]  INFO multi-target: ----> Version Default has no override target
[Aug 25 10:24:45]  INFO multi-target: ----> Version testing1 has 'XXXX' for override target
[Aug 25 10:24:45] DEBUG multi-target: Multi-target URL: XXXX
[Aug 25 10:24:45] DEBUG multi-target: Multi-target URL (obj): XXXX
[Aug 25 10:24:45] DEBUG Connecting to redis cluster
[Aug 25 10:24:45] DEBUG Redis pool already INITIALISED
[Aug 25 10:24:45]  INFO main: ----> Checking security policy: JWT
[Aug 25 10:24:45] DEBUG Chain array start
[Aug 25 10:24:45] DEBUG URL Rewrite enabled
[Aug 25 10:24:45] DEBUG Chain array end
[Aug 25 10:24:45] DEBUG main: ----> Custom middleware processed
[Aug 25 10:24:45] DEBUG Chain completed
[Aug 25 10:24:45] DEBUG main: ----> Rate limits available at: /service-information-api/tyk/rate-limits/
[Aug 25 10:24:45]  INFO main: ----> Setting Listen Path: /service-information-api/
[Aug 25 10:24:45] DEBUG Subrouter done
[Aug 25 10:24:45] DEBUG tempSpecRegister done
[Aug 25 10:24:45]  INFO main: --> Loading API: Services API
[Aug 25 10:24:45]  INFO main: ----> Tracking: (no host)
[Aug 25 10:24:45] DEBUG Storage Engine already initialised...
[Aug 25 10:24:45] DEBUG Redis handles: 1
[Aug 25 10:24:45] DEBUG Storage Engine already initialised...
[Aug 25 10:24:45] DEBUG Redis handles: 1
[Aug 25 10:24:45] DEBUG Storage Engine already initialised...
[Aug 25 10:24:45] DEBUG Redis handles: 1
[Aug 25 10:24:45] DEBUG Storage Engine already initialised...
[Aug 25 10:24:45] DEBUG Redis handles: 1
[Aug 25 10:24:45] DEBUG main: ----> Loading Middleware
[Aug 25 10:24:45]  INFO main: ----> Multi target enabled
[Aug 25 10:24:45]  INFO multi-target: ----> Version testing1 has 'XXXX' for override target
[Aug 25 10:24:45] DEBUG multi-target: Multi-target URL: XXXX
[Aug 25 10:24:45] DEBUG multi-target: Multi-target URL (obj): XXXX
[Aug 25 10:24:45]  INFO multi-target: ----> Version v1 has no override target
[Aug 25 10:24:45] DEBUG Connecting to redis cluster
[Aug 25 10:24:45] DEBUG Redis pool already INITIALISED
[Aug 25 10:24:45]  INFO main: ----> Checking security policy: OpenID
[Aug 25 10:24:45] DEBUG Chain array start
[Aug 25 10:24:45] DEBUG URL Rewrite enabled
[Aug 25 10:24:45] DEBUG Chain array end
[Aug 25 10:24:45] DEBUG main: ----> Custom middleware processed
[Aug 25 10:24:45] DEBUG Chain completed
[Aug 25 10:24:45] DEBUG main: ----> Rate limits available at: /services-api/tyk/rate-limits/
[Aug 25 10:24:45]  INFO main: ----> Setting Listen Path: /services-api/
[Aug 25 10:24:45] DEBUG Subrouter done
[Aug 25 10:24:45] DEBUG tempSpecRegister done
[Aug 25 10:24:45] DEBUG Checker host list
[Aug 25 10:24:45]  INFO host-check-mgr: Loading uptime tests...
[Aug 25 10:24:45] DEBUG host-check-mgr: --- Setting tracking list up
[Aug 25 10:24:45] DEBUG host-check-mgr: Reset initiated
[Aug 25 10:24:45] DEBUG [HOST CHECKER] Checker reset queued!
[Aug 25 10:24:45] DEBUG Checker host Done
[Aug 25 10:24:45]  INFO main: Initialised API Definitions
[Aug 25 10:24:45] DEBUG main: Loading policies
[Aug 25 10:24:45] DEBUG main: Using Policies from Dashboard Service
[Aug 25 10:24:45] DEBUG Calling: https://XXXX:3000/system/policies
[Aug 25 10:24:45] ERROR Failed to decode body: json: cannot unmarshal string into Go value of type []main.DBPolicy
[Aug 25 10:24:45]  INFO main: Gateway started (v2.2.0.4)
[Aug 25 10:24:45]  INFO main: --> Listening on address:
[Aug 25 10:24:45]  INFO main: --> Listening on port: 8080
[Aug 25 10:24:50] DEBUG Calling: https://XXXX:3000/register/ping
[Aug 25 10:24:50] DEBUG Sending Heartbeat as: c091d6a3-19ee-41cf-4c0f-1c2bdc7ca1b2
[Aug 25 10:24:50] DEBUG Hearbeat Finished: Nonce Set: YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyYmM3NTY3NzQzMjJjNDM3ZjdhMjM1NWQzZTQ2YzQ4MDU=

And here’s the error from the dashboard:

[Aug 25 10:24:45] ERROR Nonce failed! Node:c091d6a3-19ee-41cf-4c0f-1c2bdc7ca1b2 Nonce was:YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyZjc5MDBkNDAxNmY5NDc3MzY3NWRlZGI2MGJhY2QxYjA= Should be: YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyNWE2NDdjYjc0N2M4NDhhYTRkNzVhNzg1ZDVhZWY3Njk=

So you can see the dashboard is expecting nonce YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyNWE2NDdjYjc0N2M4NDhhYTRkNzVhNzg1ZDVhZWY3Njk= however YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyZjc5MDBkNDAxNmY5NDc3MzY3NWRlZGI2MGJhY2QxYjA= is sent.

Looking at the debug log you can see that:

[Aug 25 10:24:45] DEBUG Hearbeat Finished: Nonce Set: YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyNWE2NDdjYjc0N2M4NDhhYTRkNzVhNzg1ZDVhZWY3Njk=

Appears before:

[Aug 25 10:24:45] DEBUG Loading APIS Finished: Nonce Set: YzA5MWQ2YTMtMTllZS00MWNmLTRjMGYtMWMyYmRjN2NhMWIyZjc5MDBkNDAxNmY5NDc3MzY3NWRlZGI2MGJhY2QxYjA=

Could this be a race condition between the heartbeat and the loading of the APIs?

Okay so now I’ve been looking at the code on GitHub and I have a theory. Now i’ve not used Go before so I’m unaware of any subtleties around it’s threading but:

Wherever you deal with the nonce you’re using ServiceNonceMutex.Lock() and ServiceNonceMutex.UnLock()

However during a HTTP request you’re doing it twice - once to read the nonce and once to set it. Here’s an abbreviated exert from:

ServiceNonceMutex.Lock()
newRequest.Header.Add("x-tyk-nonce", ServiceNonce)
ServiceNonceMutex.Unlock()

....

ServiceNonceMutex.Lock()
ServiceNonce = thisVal.Nonce
log.Debug("Hearbeat Finished: Nonce Set: ", ServiceNonce)
ServiceNonceMutex.Unlock()

However at the same time you’re also doing this in the policy load:

Exert:

ServiceNonceMutex.Lock()
newRequest.Header.Add("x-tyk-nonce", ServiceNonce)
ServiceNonceMutex.Unlock()

....

ServiceNonceMutex.Lock()
ServiceNonce = thisList.Nonce
log.Debug("Loading Policies Finished: Nonce Set: ", ServiceNonce)
ServiceNonceMutex.Unlock()

So talking about Threads we have 2, one running the heartbeat and the other loading the policies:

Thread 2, Lock mutex
Thread 2, Retreive NONCE-1
Thread 2, Unlock mutex
Thread 1, Lock mutex.
Thread 1, Retreive NONCE-1
Thread 1, Unlock mutex
Thread 1, HTTP CALL
Thread 1, Lock Mutex
Thread 1, Set NONCE-2
Thread 1, Unlock Mutex
Thread 2, HTTP CALL
Thread 2, ERROR

The solution I think here, although it has other potential knock on issues, is to lock the mutex for the entirety of the HTTP call. This would ensure that the nonce is used and updated in one transaction if it is indeed updated after every http transaction.

EDIT. I think the actual threading issue is probably between the API Loading and the Heartbeat which then has a knock on effect on the policy rather than it being the policy loading thread at fault. I can’t seem to see how, looking at the code, the Heartbeat and API Loading could operate with the mutex to produce the log output above.

Thoughts from the Tyk development team?

Cheers,
Richard

Thank Richard,

It could indeed be an issue with the mutex, but a mutex works across threads / goroutines, so each time a read or a write occurs to the nonce it is locked, this will also lock any threads trying to access the resource. Mind you, because the other side will be iterating the nonce on the heartbeat, there’s a chance it changes dashboard side, even though it is correct gateway-side.

In fact, the simplest thing to do would actually be to move the heartbeat to after the initial load has occurs, since this causes the most nonce changes, it’s a tricky one.

M.

Moving the heartbeat to after the initial load was my initial thought too but I don’t know the subtitles around the length of time the API loading takes on production servers vs the timeout of the node registration.

Regarding mutex while I concur that it does block across threads; however there is a time during the http connection where the mutex is unlocked, this could lead to concurrency issues where two threads are issuing a http request at exactly the same moment - they’ll be race condition and whichever completes first will succeed (as the nonce will/should only be valid once).

Agreed that there may be something dashboard side although no code for me to look through on that. I’ll try running dashboard in debug mode (if that mode exists?) and see if I can see anything.

Richard

Hi Richard,

We’ve just pushed a patch to our develop branch which makes the mutexes work across the lifetime of the transaction, this should preent other processes from hijacking the nonce and changing it, since it’s only heartbeat and loaders, there’s no knock-ons.

Thanks for the tip!

M.

Thanks Martin.

Any recommendation on installing the develop branch? I’ll give it a test.

Richard

develop is built nightly (well as soon as its pushed), so you can always just pick up a tarball from our s3:

http://tyk-nightlies.s3-website-us-east-1.amazonaws.com/

1 Like

Just to confirm I’m now running the nightly and I can now see the policies being loaded correctly! API Loading also feels to be much faster, however I’ve nothing to quantify this. I suppose something could have been blocking!

Richard

Interestingly reload of APIs doesn’t seem to be working. So when I add or edit an API in the dashboad, I see this error in the gateway logs:

[Aug 25 14:47:02] ERROR pub-sub: Payload signature is invalid!

Restarting the gateway results in the new or edited APIs being loaded.

Not sure if this is a bug as a result of the mutex locking changes?

That’s the rub with nightlies :wink:

That’s a new feature: the pub/sub payload that causes the cluster reload is now cryptographically signed with an assymetric key pair, you can disable this check in the tyk.conf by setting allow_insecure_configs: true

This is because we’ve extended the command set that can be sent to an instance (such as reconfiguring the instance remotely and hot-reloading the process without traffic loss… but we needed to secure those payloads.

1 Like

That did the trick. Cheers. Looking forward to the next proper release.

Richard

1 Like