Problems with tyk-pump's CSV output

Hi,

Pump’s CSV output doesn’t handle the contents of the AnalyticsRecord very well. In particular the analytics struct has moved on since GetLineValues() and GetFieldNames() were written and they no longer return the full AnalyticsRecord.

For example GetLineValues returns the GeoData, Latency and NetworStats structures like this rather than containing their data:
<analytics.GeoData Value>,<analytics.NetworkStats Value>,<analytics.Latency Value>
Similarly it can’t handle booleans and returns this <bool Value>

I’ve forked tyk-pump and fixed it from the tag 0.8.5.1, which is the version we use at work.
The fixed, but not very thoroughly tested branch, is at https://github.com/ps258/tyk-pump/tree/csv-fix-reflect but I don’t think that using reflection is a good idea in this case.

The reasons I think reflection is bad are:

  • It’s hard.
  • Its going to be hard to maintain.
  • The ordering seems fragile. As long as columns and fields line up, the order shouldn’t matter but I wasn’t able to see a guarantee that reflect.TypeOf(v).NumField() would always keep the same order.
  • Any time you add a built-in data structure to any of the stucts in AnalyticsRecord it will have to be deal with correctly in the reflection code which will be hard to maintain.
  • I was not able to keep the reflection code within analtyics.go because of the complexity of writing of the AnalyticsRecord structure with its sub structures, particularly since it contains anonymous structs within Geo. Instead I wrote/copied more general reflection tools and integrated them into pumps/csv.go. That’s probably just my inexperience with go.

I tried to find an opensource struct to csv library to use. GitHub - dnlo/struct2csv: Create CSV from struct or a slice of structs and GitHub - mohae/struct2csv: Create CSV from struct or a slice of structs were tried, but it’s fundamentally a bad idea to try and convert structs to CSV programmatically. For example, what do you do with a time.Time entry? The struct contains all sorts of unwanted fields, all you’re really interested in is a timestamp, so you have to code for that directly. Other built in structures must be the same. This could be an issue that someone more experienced in golang could easily solve but it was beyond my recreational level experience.

After getting it to work with reflection I decided try again using direct access to the structs. My implementation is at https://github.com/ps258/tyk-pump/tree/csv-fix-direct and is very much simpler. There would still be work to do when AnalyticsRecord is updated, but it’s pretty clear how to do it and doesn’t require magical reflection knowledge. Similarly golang types can be quickly and directly dealt with.

I haven’t looked at merging this with later versions of tyk-pump but the recent addition of the pump filter pushes me much more strongly in the direction of the direct style implementation as it will be easy to implement.

All this effort for pump csv output! The reason it’s important is that the charge for a splunk HEC is significant but we can get splunk to pick up a CSV for free so this is an important feature to us and one which many internal customers are clamoring for.

Cheers,
Pete

5 Likes

@tombuchaillot mind having a look at this one?

Hey Pete!

First of all, thanks for this!

I totally agree with you on removing the reflection here, the analytic record struct is not changing enough to have a dynamic method to obtain its values. And also it makes it way easier to test it :slight_smile:

Would you mind sending a PR with those changes? It doesn’t seem to have a conflict with pump master so it should be pretty easy.

Best,
Tom

Hi Tom,

The pull request is created. I was concerned that it might be incompatible with the new Analytics Filters features but it looks like it’s not.

Cheers,
Pete

2 Likes