SSL Let's Encrypt: certificate does not survive tyk gateway restarts

ahouben · June 19, 2017, 5:26pm

Hi,

we successfully setup tyk with SSL Let’s Encrypt (LE) support as described in the docs: //tyk.io/docs/basic-config-and-security/security/tls-and-ssl/. The LE certificate is issued when the first SSL request hits the tyk gateway which results in the following logs:

2017/06/19 16:24:43 [INFO][<hostname>] acme: Obtaining bundled SAN certificate
2017/06/19 16:24:43 [INFO][<hostname>] acme: Trying to solve TLS-SNI-01
2017/06/19 16:24:45 [INFO][<hostname>] The server validated our request
2017/06/19 16:24:46 [INFO][<hostname>] acme: Validations succeeded; requesting certificates
2017/06/19 16:24:46 [INFO] acme: Requesting issuer cert from https://acme-v01.api.letsencrypt.org/acme/issuer-cert
2017/06/19 16:24:46 [INFO][<hostname>] Server responded with a certificate.
time="Jun 19 16:24:46" level=info msg="[SSL] State change detected, storing"

So far, so good. We are using docker containers (also running the tyk dashboard, gateway and pump in docker containers) and often start and stop them for development purposes. It seems the tyk gateway does not remember the previously issued certificate and asks LE for a new certificate every time the tyk gateway is started. This causes LE rate limiting to kick in (Rate Limits - Let's Encrypt) which denies reissuing certificates (up to one week after 5 successful certificates were issued).

This behavior is mentioned in //tyk.io/docs/basic-config-and-security/security/tls-and-ssl/ where it says:

Certificates are generated by one Gateway and then shared, via an encrypted Redis key, with other Tyk nodes. Tyk with LE support is limited by LE’s rate limits, so while certificates are backed up and generated and can be re-used, over-use of the feature can cause the service to stop working.

We are using persistent redis with the appendonly yes option to produce a appendonly.aof file. But the tyk gateway reports the following on startup and doesn’t seem to read the certificate from redis store:

time="Jun 19 15:59:45" level=info msg="Control API hostname set: cs-test.greenliff.com" 
time="Jun 19 15:59:45" level=info msg="Initialising Tyk REST API Endpoints" 
time="Jun 19 15:59:45" level=error msg="Could not EXPIRE key: LOADING Redis is loading the dataset in memory" 
time="Jun 19 15:59:45" level=info msg="Starting Poller" 
time="Jun 19 15:59:45" level=info msg="--> Using SSL LE (https)" 
time="Jun 19 15:59:45" level=warning msg="[SSL] --> No SSL backup: Key not found" 
time="Jun 19 15:59:45" level=info msg="Setting up Server" 
time="Jun 19 15:59:45" level=info msg="Registering node."

We can’t seem to find a way to persist certificates for re-use across restarts of the tyk gateway. Is this possible and if yes, how?

Thank you in advance,
Alexander Houben

Kos · June 20, 2017, 9:14am

Hi Alexander,

can you share your tyk.conf ?

Thanks,
Kos @ Tyk Support Team

ahouben · June 20, 2017, 9:42am

Sure, here’s the tyk.conf:

{
    "listen_port": 8080,
    "secret": "<secret>",
    "node_secret": "<node_secret>",
    "template_path": "/opt/tyk-gateway/templates",
    "tyk_js_path": "/opt/tyk-gateway/js/tyk.js",
    "middleware_path": "/opt/tyk-gateway/middleware",
    "use_db_app_configs": true,
    "db_app_conf_options": {
        "connection_string": "http://tyk_dashboard:3000",
        "node_is_segmented": false,
        "tags": ["test2"]
    },
    "app_path": "/opt/tyk-gateway/apps/",
    "storage": {
        "type": "redis",
        "host": "redis",
        "port": 6379,
        "username": "",
        "password": "",
        "database": 0,
        "optimisation_max_idle": 100
    },
    "enable_analytics": true,
    "analytics_config": {
        "type": "mongo",
        "csv_dir": "/tmp",
        "mongo_url": "",
        "mongo_db_name": "",
        "mongo_collection": "",
        "purge_delay": -1,
        "ignored_ips": []
    },
    "health_check": {
        "enable_health_checks": true,
        "health_check_value_timeouts": 60
    },
    "optimisations_use_async_session_write": true,
    "enable_non_transactional_rate_limiter": true,
    "enable_sentinel_rate_limiter": false,
    "allow_master_keys": false,
    "policies": {
        "policy_source": "service",
        "policy_connection_string": "http://tyk_dashboard:3000",
        "policy_record_name": "tyk_policies"
    },
    "hash_keys": true,
    "close_connections": true,
    "allow_insecure_configs": true,
     "coprocess_options": {
        "enable_coprocess": false,
        "coprocess_grpc_server": ""
    },
    "enable_bundle_downloader": true,
    "bundle_base_url": "",
    "global_session_lifetime": 100,
    "force_global_session_lifetime": false,
    "max_idle_connections_per_host": 100,
    "http_server_options": {
        "use_ssl_le": true,
        "server_name": "cs-test.greenliff.com"
    },
    "hostname": "cs-test.greenliff.com"
}

Martin · June 20, 2017, 10:44am

This looks like a redis config issue to me - if the driver is spitting out errors like that there may be something wrong.

Have you tried with a managed redis install like redis labs, just to remove any doubt that it’s a redis DB config issue?

ahouben · June 30, 2017, 7:21pm

Hi @Martin, hi @Kos,

it took me some time to investigate. You were right, the observed issue has to do with the interplay between tyk_redis and the tyk_gateway.

TL;DR The issue is apparently caused by a race when starting up the docker containers. Ensuring that tyk_redis is fully up and running before starting the tyk_gateway service solves the problem and we have never seen the problem occur since.

Details
Since we are using persistent tyk_redis with the appendonly yes config which may produce a rather large appendonly.aof file on disk (70-80MB) the startup of tyk_redis may take more than 10 seconds:

1:M 30 Jun 10:53:45.099 * DB loaded from append only file: 9.951 seconds
1:M 30 Jun 10:53:45.099 * The server is now ready to accept connections on port 6379

The tyk_gateway may start earlier and in case tyk_redis is not ready produces the logs from the first post:

time="Jun 19 15:59:45" level=info msg="Control API hostname set: cs-test.greenliff.com" 
time="Jun 19 15:59:45" level=info msg="Initialising Tyk REST API Endpoints" 
time="Jun 19 15:59:45" level=error msg="Could not EXPIRE key: LOADING Redis is loading the dataset in memory" 
time="Jun 19 15:59:45" level=info msg="Starting Poller" 
time="Jun 19 15:59:45" level=info msg="--> Using SSL LE (https)" 
time="Jun 19 15:59:45" level=warning msg="[SSL] --> No SSL backup: Key not found" 
time="Jun 19 15:59:45" level=info msg="Setting up Server" 
time="Jun 19 15:59:45" level=info msg="Registering node."

Once this happens, the certificate in redis is somehow overwritten and lost, no matter how fast redis may startup during the next docker-compose down and docker-compose up -d cycles.

To solve the problem we replaced the default tyk_gateway ./entrypoint.sh with a wait-for-redis.sh script:

#!/usr/bin/env bash
cmdname=$(basename $0)

ECHO="starting"

while true
do
    echo "waiting, '$ECHO'"
    # Send PING, expect PONG
    ECHO="(printf \"PING\r\n\";) | nc tyk_redis 6379"
    PONG="`eval ${ECHO}`"
    if [[ "$PONG" == *"PONG"* ]]; then break; fi
    sleep 2;
done

echo "Found $PONG"
sleep 2
exec "./entrypoint.sh"

With this setup, the SSL LE certificate was never sporadically lost again.