Insights

The DoD Can't Keep Its Certificates Current

That's the Point

April 7, 2026 · [cyphrs] Team · 10 min read

1. The Agency That Writes the Rules

DISA, the Defense Information Systems Agency, publishes the STIGs. Security Technical Implementation Guides. If you've worked in federal IT or been a contractor in the DoD ecosystem, you know these documents. They're the canonical reference for how systems should be configured, hardened, and maintained. They cover everything from password policy to, yes, certificate management. And one of the rules they're very clear about: never bypass a browser certificate warning.

Last week, DISA's own cybersecurity portal, cyber.mil, was caught serving file downloads over an expired TLS certificate. The cert expired on March 20, 2026. It stayed expired for days. When users encountered the browser warning that DISA's own standards say should never be bypassed, DISA's response was to publish instructions telling users to click through it.

The Hacker News thread pulled 165 points and 163 comments. People were amused, frustrated, unsurprised. But the discussion that unfolded underneath the jokes is more interesting than the incident itself, because it maps out exactly why this keeps happening and why it's about to get much worse.

2. What 163 Comments Tell You About PKI in Practice

The HN thread is worth reading in full, because the people commenting aren't armchair critics. A lot of them have worked in DoD IT. They've been the person responsible for that cert, or someone like it. And their explanations converge on a few themes that apply far beyond the federal government.

Recurring themes from the HN discussion

Contractor turnover breaks institutional knowledge

The person who provisioned the cert left. The person who replaced them didn't know the cert existed. The handoff documentation was either incomplete or nonexistent. This is the single most common cause of certificate expiry in large organizations, and it has nothing to do with automation tooling.

Split ownership across teams

One team manages the web servers. Another team manages certificates. A third team handles DNS. Nobody owns the full lifecycle from issuance to deployment to renewal. The cert lives in a gap between three different responsibility boundaries.

Fundamental terminology confusion

Multiple commenters flagged that DISA's internal ticketing referenced "TSSL Certification renewal," a term that doesn't exist. Not a typo. An indicator that the operators managing these systems don't have a working mental model of what they're managing. You can't automate what you can't name.

Legacy infrastructure resists automation

ACME and Certbot were repeatedly cited as obvious solutions. But the DoD's infrastructure includes air-gapped networks, CAC authentication layers, and systems that predate modern certificate management. "Just use Certbot" is not a serious answer when your infrastructure was designed before Certbot existed.

None of these are unique to the DoD. Contractor turnover maps to employee turnover in any organization. Split ownership maps to the DevOps/security/infrastructure silos that exist everywhere. Terminology confusion maps to the reality that most teams managing certificates have other primary responsibilities. And legacy infrastructure? That's practically the definition of enterprise IT.

3. This Isn't an Automation Problem

The instinctive response to a certificate expiry is "automate it." Set up ACME. Deploy cert-manager. Buy a CLM platform. And yes, automation prevents the specific failure of a cert expiring because someone forgot to renew it. But the DISA incident reveals something that automation alone doesn't fix.

Nobody was watching. Not "nobody renewed it." Nobody was watching it. There was no system that knew this certificate existed, knew when it expired, knew who owned it, and knew what would break when it lapsed. The certificate existed in an organizational blind spot. And you can't automate a process that nobody knows needs to happen.

CyberArk published research showing that 67% of organizations experience certificate-related outages monthly. Monthly. Under the current 398-day lifetime regime. And the most common cause isn't "we tried to renew and it failed." It's "we didn't know this certificate existed until it expired and something broke."

What people think happened

Someone forgot to renew a certificate. The fix: set up automated renewal so it doesn't happen again. Problem solved.

What actually happened

Nobody knew the certificate existed. No inventory tracked it. No monitoring watched it. No ownership model assigned it. Automation would have helped, but only if someone had known to configure it for this specific cert on this specific system.

This distinction matters because it changes what you invest in. If the problem is "forgot to renew," you buy an automation tool. If the problem is "didn't know it existed," you need visibility first. You need discovery. You need an inventory that can tell you what certificates are deployed, where, by whom, and when they expire, before you can automate any of it.

4. Now Multiply by the SC-081 Timeline

The DISA cert expired after roughly a year. Under the 398-day lifetime regime that's been standard for years, you get one chance per year to catch these blind spots. The cert either gets renewed or it doesn't. Once a year, the system is tested.

SC-081 changes that math. As of March 15, 2026, maximum TLS certificate lifetimes dropped to 200 days. That means two renewal cycles per year instead of one. By March 2027, it drops to 100 days: roughly four cycles. By March 2029, 47 days: about eight cycles per year, per certificate.

Regime	Max Lifetime	Cycles/Year	Blind Spot Impact
Pre-March 2026	398 days	~1	Annual embarrassment
Now (March 2026)	200 days	~2	Biannual outage risk
March 2027	100 days	~4	Quarterly fire drills
March 2029	47 days	~8	Continuous exposure

Each row in that table is a multiplier on the DISA problem. Every unknown certificate, every unmonitored endpoint, every ownership gap gets tested more frequently. The organizations that are already failing at 398 days won't gradually degrade. They'll hit a wall where the volume of expiring certificates overwhelms whatever ad hoc process was barely holding things together.

Sectigo's Chief Compliance Officer wrote in TechRadar that October 1, 2026 could be "the day SSL/TLS certificates break the internet." He's talking about the first wave of 200-day certificates all expiring simultaneously. But the underlying cause isn't the shorter lifetimes. It's the blind spots that shorter lifetimes expose. DISA is just the preview.

5. The Question That Should Come First

When a certificate expires and causes an outage, the post-mortem always starts with "how do we prevent this from happening again?" Automation. Monitoring. Alerts. Runbooks. All good answers. All wrong starting point.

The question that should come first: "How many other certificates do we have that we're also not watching?"

Most organizations can't answer that. They can tell you about the certificates they manage intentionally. The ones in their CLM platform, or their Let's Encrypt automation, or their cloud provider's certificate manager. But the certificates that cause outages are almost never the ones you're managing intentionally. They're the ones someone provisioned for a load balancer three years ago, or the wildcard cert that got copied to six servers by hand, or the internal service that's using a public certificate because that's what the original developer knew how to request.

Discovery is the prerequisite for everything else. Not automation, not lifecycle management, not policy. Discovery. You have to know what you have before you can manage it. And my guess is that the majority of organizations running production infrastructure right now couldn't produce a complete certificate inventory if their uptime depended on it. Which, increasingly, it does.

6. Not Every Certificate Needs This Treadmill

Here's the part that gets lost in the "automate everything" response. The SC-081 compression applies to public certificates. Certificates issued by publicly trusted CAs that browsers recognize. And that makes sense for public-facing services. Shorter lifetimes reduce the window of exposure if a private key is compromised. The security rationale is sound.

But a significant portion of the certificates in any organization's infrastructure aren't serving public traffic. They're securing internal APIs. Enabling service-to-service communication. Authenticating microservices. Backing load balancer connections between systems that no browser will ever touch. And these certificates are on the same 200-day (soon 47-day) treadmill because someone requested them from a public CA, probably because that was the only process the organization had for getting a certificate.

Private CA certificates don't follow the SC-081 timeline. They follow whatever policy you set. A private certificate can be valid for a year, or a week, or an hour, based on your security requirements and operational capacity. Not based on a browser vendor's compliance calendar.

The question to ask about every certificate

Does this certificate need to be trusted by browsers? If yes, it's a public certificate and the SC-081 timeline applies. If no, it probably shouldn't be a public certificate at all. And migrating it to a private CA removes it from the compression treadmill permanently.

This is the insight that turns DISA's embarrassment into an architectural lesson. The answer to "how do we handle 8x more renewals per year" isn't always "automate 8x harder." Sometimes it's "stop putting certificates on the public treadmill that don't need to be there." Reduce the surface area. Move internal certificates to private trust. Then automate what's left.

7. The Pattern That Keeps Repeating

The DISA incident isn't new. Microsoft let a certificate expire and took down Teams and Outlook for millions of users. Equifax attributed part of its breach response failure to an expired certificate on a security monitoring tool. Ericsson's expired cert knocked out mobile data for millions of O2 customers in the UK. Each time, the post-mortem identifies the same causes: unknown certificate, no ownership, no monitoring.

And each time, the industry's response is the same: "they should have automated." Which is true, but incomplete. They should have known what they had first. Automation without discovery is just faster execution of an incomplete process. You automate the certificates you know about and the ones you don't sit there ticking toward expiry in the exact same blind spot they've always occupied.

What DISA makes vivid, because of who they are, is that this problem doesn't discriminate by competence or budget. The DoD has more security expertise, more compliance requirements, and more audit pressure than practically any organization on earth. And they still can't keep their certs current. That's not a staffing failure or a tooling gap. That's a structural problem with how we think about trust infrastructure.

8. What Would Have to Be True

For the DISA failure not to recur (at DISA, and everywhere else), a few things would have to be true. Not aspirational things. Baseline architectural requirements that most organizations haven't met yet.

Every certificate is inventoried continuously, not once.

Point-in-time scans miss ephemeral certificates and newly provisioned endpoints. Discovery has to be continuous, covering every network segment and service that might have a certificate attached.

Every certificate has an owner, not just a team.

Team ownership degrades when people leave. Certificate ownership needs to be tied to a role and a system, not an individual. When the contractor who deployed it moves on, the ownership record stays.

Internal and external certificates are treated as different problems.

A certificate securing an internal API and a certificate securing a public website have different threat models, different compliance requirements, and different lifecycle needs. Managing them the same way (same CA, same tools, same rotation schedule) creates unnecessary operational load and obscures the actual risk profile.

Monitoring fires before users do.

The browser warning is the worst possible alerting mechanism. If the first signal that a certificate has expired is a user seeing a security warning, everything upstream has failed. Monitoring should fire at 30 days, 14 days, 7 days, and escalate if nobody acts.

None of this is technically hard. All of it is organizationally hard. And the SC-081 timeline is about to make "organizationally hard" mean "operationally catastrophic" for every organization that hasn't addressed it.

DISA showed us what failure looks like at 398-day lifetimes. October 2026 will show us what it looks like at 200. The organizations that use this window to get visibility into what they actually have, which certificates are public, which should be private, who owns what, will be the ones that don't end up as the next Hacker News thread.

Check Your Score Get Early Access