May 20, 2026Sai4 min read

Kerberos got token lifetimes right in 1988

The case for short-lived credentials isn't new. What changed isn't the math — it's that the engineering objections that kept lifetimes long are no longer load-bearing.

The original Kerberos paper — Steiner, Neuman, Schiller, 1988 — argued that authentication credentials should be short-lived. The default ticket lifetime in Athena's deployment was 8 hours, and the intent was lower; the limit was hardware. By the late 1990s, public-sector Kerberos deployments routinely issued tickets with 8–10 hour lifetimes and renewable for longer windows. The thinking was already there: credentials should be cheap to issue and cheap to expire.

What happened between then and now is a story about engineering trade-offs that have since flipped.

Why credentials got long

Three reasons, all of them practical:

Mint was expensive. Signing a credential required a key operation on the issuer, network round-trips, and protocol negotiation. Doing this on every request was infeasible; doing it once a day was cheap.
Caches were leaky. The token had to live somewhere. Long-lived OS keychains and disk caches were the deployment reality. The longer the cred, the fewer write events to those caches, the lower the risk of a stale cache exposing it.
Distribution was sequential. When the issuer is one machine and the consumers are thousands, every mint hits the bottleneck. Long lifetimes flatten the load.

All three constraints have eased. Modern signing is fast (ed25519 on a laptop CPU does hundreds of thousands of signatures per second). Memory-only caches with explicit lifetimes are the norm. Issuance is horizontally scalable.

The economics flipped. The practice didn't.

The "what if work outlives the token?" objection

The most common pushback when someone proposes 5-minute TTLs: "what if my batch job runs for an hour?"

The answer is: it should be re-binding every five minutes anyway. Long-running work is a workflow signal, not a credential signal. Break the work into stages. Re-mint at each boundary. Make the boundaries observable — they're where audit shows up anyway. A long-running agent that re-binds every five minutes is producing the audit trail you want; the same agent holding a 24-hour token is producing the audit trail you'll regret.

The pattern is well-trodden in other contexts. AWS STS session tokens default to 1 hour and the recommended pattern for long-running batch is to wrap the work in a credential-refresh loop. Kubernetes projected service account tokens are short-lived by default and the kubelet refreshes them for you. The infrastructure is already there.

The "but what about offline" objection

This one's harder. If your workload genuinely runs disconnected — a build agent in an air-gapped lab, a device on a yacht — you can't refresh mid-run. The honest answer is: the credential lifetime is bounded by the maximum offline window. If you need 8 hours of offline operation, the TTL is at least 8 hours.

That's fine. The win isn't to drive every credential to 5 minutes; it's to size each credential to its actual work. Most credentials in most environments live online and could be 5 minutes. Treat 5 minutes as the default, longer lifetimes as the exception with a documented reason.

Receipts from organizations that did this

Public examples of teams that moved aggressively toward short-lived credentials:

Netflix documented its Metatron and related work pushing toward ephemeral credentials for inter-service auth.
HashiCorp Vault's dynamic secrets framing is, fundamentally, an argument that credentials should be issued at the moment of need and expire at the moment of completion. Vault has been making this argument since 2015.
Google's BeyondCorp posture, documented across several papers, treats every request as if the network is hostile and the credential is the only authority. That model is much harder when credentials are long-lived — the original paper doesn't focus on TTL specifically, but the subsequent operational work did.

None of these are new. The Kerberos argument from 1988 was right. What changed is that the engineering reasons to ignore it are gone.

The actual hard part

The thing that's actually hard about short-lived credentials isn't the mint path or the cache invalidation. It's agreeing what "the task" is.

If you want a credential to last exactly as long as a task, you have to know when the task starts and when it ends. For a request — easy, the task is the request. For a build — moderate, the task is the build job. For an agent — sometimes hard. Agents loop, retry, escalate. "When does the task end" can be ambiguous.

The pragmatic answer for agent systems is to define the task at the plan level: when the agent starts a plan, mint a credential good for the expected duration of that plan plus a small grace window. If the plan runs over, re-bind. If the plan completes early, the credential expires unused.

This is harder than "give the agent a long-lived key." It's also the only way to bring the blast radius down.

Kerberos got the math right almost forty years ago. We just needed the rest of the stack to catch up.

End of post

Want to talk to us? Talk to founders or email the team.