Somewhere right now, a production service is down because a certificate expired and nobody noticed. It happens to the biggest names — Microsoft Teams went down in 2020 because of an expired certificate. Equifax missed a breach detection because an expired cert disabled their SSL inspection tool. Let’s Encrypt’s root expiry in 2021 broke millions of devices. Certificates are invisible infrastructure until they break, and then they’re the only thing anyone can see.
TLS certificates are the foundation of trust on the internet. They authenticate servers, encrypt connections, and establish that you’re talking to who you think you’re talking to. Managing them well is boring, unglamorous work. Managing them poorly is how you end up on the front page. Here’s how to get the boring part right.
DO / DON’T
DO:
- Automate certificate renewal — Human-dependent renewal processes fail. ACME protocol with Let’s Encrypt or your CA of choice.
- Maintain a certificate inventory — Every cert, every domain, every expiration date. You can’t manage what you can’t see.
- Monitor expiration dates — Automated alerts at 60, 30, 14, and 7 days before expiry. Multiple notification channels.
- Use short-lived certificates — 90-day certs (Let’s Encrypt default) force automation and limit exposure from compromised keys.
- Enforce TLS 1.2 minimum, prefer TLS 1.3 — Disable TLS 1.0 and 1.1 everywhere.
- Check Certificate Transparency logs — Monitor CT logs for unauthorized certificates issued for your domains.
DON’T:
- Don’t let certificates expire in production — This is preventable. If a cert expires and causes an outage, the process failed, not the technology.
- Don’t use self-signed certificates in production — They train people to click through security warnings, and they break the chain of trust.
- Don’t share private keys across services — Compromise of one service shouldn’t compromise all of them.
- Don’t store private keys in plaintext — Use hardware security modules (HSMs), secrets managers, or encrypted key stores.
- Don’t pin certificates unless you have a robust rotation plan — Certificate pinning without automated rotation is a self-inflicted outage waiting to happen.
- Don’t use wildcard certificates everywhere — They’re convenient but expand the blast radius of a key compromise.
Certificate Lifecycle Management
Step 1: Build Your Certificate Inventory
Before you can manage certificates, you need to know what you have. Scan your environment:
- External-facing: Use tools like SSL Labs, Censys, or crt.sh to discover certificates issued for your domains.
- Internal: Scan your internal networks for services presenting certificates. Tools like Nmap with ssl-cert scripts, or dedicated certificate management platforms, identify internal certs.
- Code signing and client certs: Don’t forget non-TLS certificates — code signing, S/MIME, client authentication, and VPN certificates all have lifecycles too.
For each certificate, record: domain/CN, issuer (CA), issuance date, expiration date, key algorithm and size, hosting location/service, responsible team, and renewal method (manual or automated).
Step 2: Automate Renewal with ACME
The ACME protocol (Automated Certificate Management Environment) is the standard for automated certificate issuance and renewal. Let’s Encrypt is the most widely used ACME CA, issuing free, domain-validated certificates with 90-day lifetimes.
Certbot is the reference ACME client:
# Install certbot
sudo apt install certbot python3-certbot-nginx
# Obtain and install a certificate for Nginx
sudo certbot --nginx -d yourdomain.com -d www.yourdomain.com
# Auto-renewal is configured automatically via systemd timer
sudo systemctl status certbot.timer
Other ACME clients:
- acme.sh — Shell script, no dependencies, supports multiple DNS providers for wildcard certs
- Caddy — Web server with built-in automatic HTTPS via ACME
- Traefik — Reverse proxy with native ACME support
- cert-manager — Kubernetes-native certificate management with ACME and other issuers
For internal certificates where Let’s Encrypt isn’t appropriate, tools like step-ca provide a private ACME CA for internal PKI.
Step 3: Monitor Expiration
Automation handles renewals, but monitoring catches failures. Layer your monitoring:
- Certificate monitoring services — Updown.io, Keychest, or Cert Spotter check your endpoints and alert on approaching expiration.
- Infrastructure monitoring — Prometheus with the
blackbox_exporteror Nagios withcheck_ssl_certcan monitor certificate expiry as part of your existing monitoring stack. - Cron-based checks — At minimum, a simple script that checks expiry dates:
echo | openssl s_client -connect yourdomain.com:443 -servername yourdomain.com 2>/dev/null | openssl x509 -noout -dates
Alert thresholds: 60 days (informational), 30 days (warning), 14 days (critical), 7 days (emergency). If you’re getting 7-day alerts, your automation is broken.
Step 4: Certificate Transparency Monitoring
Certificate Transparency (CT) is a public, append-only log of all publicly trusted certificates. Every CA is required to log certificates they issue. This means you can monitor for unauthorized certificates issued for your domains.
Why this matters: if an attacker compromises a CA or tricks one into issuing a certificate for your domain, the CT log will show it. Without monitoring, you’d never know until the certificate was used in a man-in-the-middle attack.
How to monitor:
- crt.sh — Free CT log search. Query your domains periodically.
- Facebook CT Monitoring — Alerts on new certificates for your domains.
- Cert Spotter — Automated CT monitoring with email alerts.
- Google Certificate Transparency — The project behind the standard, with documentation and tools.
Set up monitoring for every domain you own. When a certificate appears that you didn’t request, investigate immediately — it could be a subdomain takeover, a compromised CA account, or a misissuance event.
TLS Configuration
Protocol Versions
- TLS 1.3 — Preferred. Faster handshake (one round trip vs. two), mandatory forward secrecy, simplified cipher suite negotiation, and removal of legacy algorithms. NIST SP 800-52 Rev. 2 recommends TLS 1.3 for all new deployments.
- TLS 1.2 — Minimum acceptable. Still secure when configured correctly (ECDHE key exchange, AEAD ciphers). Required for compatibility with older clients.
- TLS 1.1 and below — Deprecated. Disable everywhere. IETF RFC 8996 formally deprecated TLS 1.0 and 1.1 in 2021.
Cipher Suite Configuration
Use Mozilla’s SSL Configuration Generator to generate tested configurations for your web server. Choose the “Intermediate” profile for broad compatibility or “Modern” for TLS 1.3-only.
Key principles:
- Forward secrecy required — ECDHE key exchange ensures that compromising the server’s long-term key doesn’t decrypt past sessions.
- AEAD ciphers only — AES-GCM, ChaCha20-Poly1305. No CBC mode with TLS 1.2 if avoidable.
- No RSA key exchange — RSA key exchange doesn’t provide forward secrecy. Use ECDHE-RSA or ECDHE-ECDSA.
- 256-bit keys for high-value targets — AES-256-GCM for anything handling sensitive data.
Security Headers
Beyond the TLS configuration itself, deploy these headers:
- HSTS (HTTP Strict Transport Security) — Tells browsers to always use HTTPS. Include
max-age=31536000; includeSubDomains; preload. Submit to the HSTS preload list for maximum protection. - Expect-CT — (Deprecated in favor of built-in browser CT enforcement, but still supported.) Instructs browsers to require Certificate Transparency for your domain.
Test your configuration with SSL Labs Server Test. Aim for an A+ rating. Anything below an A means you have deprecated protocols or weak ciphers enabled.
Key Management
Private Key Protection
- Generate keys on the system that will use them — Don’t generate private keys on one machine and transfer them to another if avoidable. Each transfer is an exposure risk.
- Use strong key sizes — RSA 2048-bit minimum (4096 preferred), ECDSA P-256 or P-384. NIST SP 800-57 provides key management recommendations.
- Store keys in HSMs or secrets managers — AWS Certificate Manager, Azure Key Vault, GCP Certificate Manager, or HashiCorp Vault for automated deployments. On-premises, use FIPS 140-2 validated HSMs for high-value keys.
- Set restrictive file permissions — Private key files should be readable only by the service that uses them.
chmod 600and ownership by the service account. - Never commit private keys to version control — Scan repos with tools like truffleHog or GitLeaks. A key committed to Git is a key shared with everyone who has repo access, plus everyone who ever will.
Key Rotation
Rotate private keys with every certificate renewal, not just the certificate itself. A renewed certificate with the same private key means a previously compromised key is still in play.
For ACME-managed certificates, most clients generate a new key pair with each renewal by default. Verify this behavior in your client’s configuration.
Revocation
When a private key is compromised, the certificate must be revoked immediately. Don’t wait for expiry.
Revocation Methods
- CRL (Certificate Revocation List) — The CA publishes a list of revoked certificate serial numbers. Clients download and check the list. Scales poorly, often cached, and many clients don’t check CRLs at all.
- OCSP (Online Certificate Status Protocol) — Real-time revocation checking. The client queries the CA’s OCSP responder for the status of a specific certificate. Better than CRL but adds latency and creates a privacy concern (the CA sees which sites you visit).
- OCSP Stapling — The server queries the OCSP responder and includes (staples) the signed response in the TLS handshake. The client gets revocation status without contacting the CA. Enable OCSP stapling on your servers — it’s faster and more private.
How to Revoke
- Let’s Encrypt:
certbot revoke --cert-path /etc/letsencrypt/live/yourdomain.com/cert.pem - Commercial CAs: Log into your CA’s portal and initiate revocation. Have the certificate serial number ready.
- Internal CAs: Use your CA’s revocation mechanism and publish an updated CRL.
After revocation, reissue the certificate with a new key pair. Revocation without reissuance leaves the service down.
If It Already Happened
If a certificate has already expired and caused an outage:
- Renew or reissue the certificate immediately. For ACME:
certbot renew --force-renewal. For commercial CAs: reissue through the portal. - Restart the affected services after installing the new certificate.
- Investigate why monitoring and automation failed. Fix the root cause before closing the incident.
- Check for secondary damage — did the outage mask a security event? Were fallback connections downgraded to unencrypted?
If a private key was compromised:
- Revoke the certificate immediately.
- Generate a new key pair and reissue the certificate.
- Investigate how the key was exposed. Check version control history, configuration management systems, and access logs.
- Report through CISA if the compromise affected critical infrastructure or public-facing services.
Certificates are infrastructure. Treat them like any other critical system: inventory them, monitor them, automate their lifecycle, and have a plan for when things go wrong. Start with your certificate inventory. Find every cert in your environment, record its expiry, and set up monitoring today. Then automate renewal with ACME. The outage you prevent is the one nobody notices.