Insights

Nobody Knows What Certificates They Have

The Pentagon just proved it. Microsoft proved it last year. Your org is probably next.

March 31, 2026 · [cyphrs] Team · 9 min read

The Pentagon runs the most sophisticated PKI on earth. It still missed one.

This week, cyber.mil hit the Hacker News front page for all the wrong reasons. The U.S. Department of Defense's Cyber Exchange, operated by DISA (the Defense Information Systems Agency, the organization that literally runs the DoD PKI), was serving file downloads over a TLS certificate that had been expired for three days.

Read that again. The org that runs the federal government's certificate infrastructure let a public-facing cert lapse on its own download portal. The site that distributes STIGs and security hardening tools. Their documentation tells users to install DoD root certificates to avoid browser warnings, and the certificate on the download page itself was expired.

The HN thread was predictably brutal. But the interesting part wasn't the mockery. It was the resignation. Nobody was surprised. The top comments weren't "how could this happen?" They were "yeah, this is normal." And that's the part worth sitting with. We wrote a longer piece on the DISA pattern here; the short version is that the inventory problem scales with the org, not with the maturity of the security team.

Microsoft has one that's been expired for eleven months

If the DISA story seems like a one-off, consider what's been happening over on r/sysadmin. A thread from March 25, 2026 documents that Microsoft's location.microsoft.com has been serving a certificate that expired on April 30, 2025. That's not a typo. The cert expired nearly a year ago.

The downstream effect: every Windows domain client relying on the location service defaults to Seattle as its timezone. Reproducible over 5G, outside corporate networks. Eleven comments on the thread, sysadmins actively troubleshooting what's happening to their fleet. No obvious Microsoft acknowledgment at the time of writing.

Microsoft presumably manages more Windows certificates than any organization on earth. They have entire teams dedicated to this. And somewhere in their infrastructure, a certificate expired in April 2025 and nobody noticed for eleven months. That's not a process failure. That's an inventory failure. You can't renew what you don't know exists.

Apple locked out every iOS developer for four hours in March

March 10, 2026. Apple's provisioning profile server, ppq.apple.com, started returning an invalid certificate chain. ERR_CERT_AUTHORITY_INVALID. Every iOS app install, every enterprise MDM distribution, every provisioning profile download: broken. For about four hours, you couldn't ship an iOS app if you tried.

The HN thread pulled 110 points and 39 comments. Apple's status page stayed green for 3.5 of those 4 hours. Developers burned time in circles trying to figure out if it was their code, their provisioning profile, their Xcode install, or something else entirely. The answer was a bad cert in Apple's chain. But nobody knew that until well after the fact.

Three companies. Three of the most technically sophisticated organizations on the planet. All caught by the same basic problem: a certificate they didn't know was about to expire, or didn't know was misconfigured, or didn't know existed in the first place.

This is a pattern, not a coincidence

CyberArk published a number earlier this year that's worth internalizing: 67% of organizations experience certificate-related outages every single month. Not annually. Monthly. And that stat is from before SC-081 Phase 1 enforcement went live on March 15, before the 200-day maximum certificate lifetime started compressing renewal cycles across the industry.

A r/devops thread from this week captures the other end of the spectrum. A DevOps engineer asking, in 2026, with 200-day enforcement already live: "Do you monitor SSL certificate expiry dates?" The responses split between "we use cert-manager" and "we found out when things broke." In March 2026. With the October cliff six months away.

The Common Thread

Every one of these incidents, from the Pentagon to Microsoft to Apple to the DevOps engineer on Reddit, shares the same root cause. Not a failure of automation. Not a failure of renewal tooling. A failure of inventory. They didn't know what they had, where it was deployed, or when it was going to expire. The certificate existed. The knowledge of the certificate did not.

Automation without inventory is just failing faster

The instinctive industry response to certificate outages has been "automate renewals." And that's correct, as far as it goes. ACME, cert-manager, Venafi, Keyfactor: all of these tools solve the renewal problem for certificates they know about. The word "know" is doing a lot of work in that sentence.

Consider what happens in a typical mid-market organization over five years. A team deploys a service with a certificate from Let's Encrypt. Another team provisions an internal service with a cert from the company's ADCS instance. A contractor sets up a VPN concentrator with a self-signed cert. Someone spins up a dev environment with certs from a test CA that was supposed to be temporary. An acquisition brings in an entirely separate PKI with its own root of trust.

None of these certificates are in the same system. Some of them are in no system at all. The ACME bot renews the Let's Encrypt certs. Maybe. If the DNS challenge still works, and the domain hasn't moved registrars, and the server the bot runs on hasn't been decommissioned. The ADCS certs autoenroll. Maybe. If the template permissions are right and the SAN attributes in AD are populated correctly (as we've documented before, that's a big "if"). The self-signed cert? Nobody remembers it's there until something breaks.

Automation covers the certificates you've pointed it at. Discovery finds the ones you forgot about, or never knew existed, or inherited from someone who left the company in 2022.

How big is the gap, really?

There's no clean industry number for "percentage of certificates that are unmanaged," because the whole point is that nobody's counted them. But the proxy metrics are telling. A r/sysadmin thread from last month describes managing 120+ SaaS apps with SSO certificates, averaging 3 to 4 renewals per month, tracked by a custom 90-day expiry script. That's one person's partial view of one layer of their organization's certificate surface. Just SaaS SSO. Not infrastructure. Not service mesh. Not IoT. Not the certs that development teams provisioned for staging environments that became production environments.

What CLM tools see

•

Certificates enrolled through the tool

•

Certificates from integrated CAs

•

Certificates in pre-configured scan targets

•

Certificates that someone remembered to import

What actually exists

•

Self-signed certs on forgotten dev boxes

•

Certs from an acquisition's separate PKI

•

Certs on network devices nobody's audited

•

Certs from CAs the team no longer uses

•

Certs issued by someone who left in 2022

The gap between these two columns is where outages live. Every expired cert that causes an incident was, at some point, a certificate that somebody provisioned on purpose. It had a reason to exist. The problem is that the knowledge of its existence lived in one person's head, or in a spreadsheet that stopped being updated, or in an automation system that was decommissioned when the team migrated to a different tool.

200-day lifetimes just made this urgent

Before SC-081, a forgotten certificate had roughly 13 months before it caused problems. That was enough time for someone to stumble across it during a routine audit, or for a quarterly review to catch it, or even for the person who provisioned it to remember it existed. Thirteen months is a generous buffer for institutional memory.

At 200 days, the buffer is roughly six months. At 100 days (March 2027), it's three. At 47 days (March 2029), it's about six weeks. The window between "a certificate was provisioned and nobody documented it" and "that certificate just took down a production service" is compressing on a fixed schedule.

SC-081 Phase	Max Lifetime	Time to discover before outage
Pre-SC-081	398 days	~13 months
Phase 1 (now)	200 days	~6 months
Phase 2 (Mar 2027)	100 days	~3 months
Phase 3 (Mar 2029)	47 days	~6 weeks

At 47 days, there's no margin for institutional memory. If a certificate isn't in your inventory the moment it's issued, you're racing a clock you don't know is ticking. And the thing about clocks you don't know about is that you tend to lose those races.

Discovery has to come first

There's a reason the entire CLM (Certificate Lifecycle Management) industry has converged on "automate renewal" as the headline pitch. Renewal is measurable. You can count how many certificates were renewed successfully. You can show a dashboard with green checkmarks. It makes for good slide decks.

Discovery is harder to sell because the value proposition is uncomfortable. You're essentially saying: "Let me show you all the things you didn't know were broken." Nobody wants that email on a Tuesday morning. But it's the only honest starting point.

My guess is that most organizations, if they ran a proper network-wide certificate scan tomorrow, would find somewhere between 2x and 10x more certificates than they think they have. Some of those certs are fine. Some are expired. Some are using deprecated algorithms. Some are self-signed and sitting on production infrastructure because a temporary workaround from 2021 became permanent when the person who set it up moved to a different company.

And here's the uncomfortable corollary: if you don't know how many certificates you have, you definitely don't know which ones need to be on public trust versus private trust. The trust bifurcation that SC-081 is forcing only works if you can actually classify your certificate inventory. You can't migrate internal services to a private CA if you don't know which services are internal. You can't move off public PKI if you don't know what's on public PKI.

What does a real inventory actually require?

Not a spreadsheet. That's the first thing to get clear on. The reason DISA and Microsoft and Apple all missed their expired certs isn't that they don't have spreadsheets. They probably have excellent spreadsheets. The problem is that spreadsheets are point-in-time snapshots of a continuously changing system. A cert gets issued today, gets deployed to three servers tomorrow, gets copied to a fourth server by a different team next month, and by the time the spreadsheet is reviewed it's describing infrastructure that no longer matches reality.

A real certificate inventory has to be continuous and automated. It has to scan your network, your endpoints, your cloud infrastructure, on a schedule measured in hours, not quarters. It has to find certificates regardless of which CA issued them (because the whole problem is that you have certificates from CAs you forgot you were using). And it has to tell you, in plain terms, what's expiring, what's misconfigured, and what's at risk.

That's the operational foundation that everything else sits on. Automation, renewal, migration to private trust, SC-081 compliance: all of these require an accurate, current, complete picture of what you actually have deployed. Without that, you're optimizing a system you can't see.

DISA couldn't see their expired cert on cyber.mil. Microsoft couldn't see their expired cert on location.microsoft.com for eleven months. Apple couldn't see the bad chain on ppq.apple.com until developers started filing bug reports. The question for your organization isn't whether you have certificates you don't know about. You do. The question is whether you'll find them before your users do.

Check Your Score Get Early Access