May 23, 2026Sai3 min read

Policy as an output, not an input

Authoring policy from imagination doesn't scale to agent fleets. Drafting policy from observed behavior does.

The classic authorization workflow is straight out of the 1990s: someone writes a policy, the platform enforces it, and you find out at runtime which cases the author didn't think of. The policy is the input; the system's behavior is the output.

That works fine when there are forty service accounts and three platform engineers. It does not work when there are four hundred agents and a different engineer wrote each one.

What changes with agents

Agents declare what they need at runtime. They pick tools. They form arguments. They chain calls together in ways their author may not have anticipated, let alone the security team. The thing being secured is no longer a known surface — it's a behavior that becomes visible only when you watch it run.

If the policy is an input, you're guessing. If the policy is an output — drafted from what the agent actually does and then approved — you're describing reality.

Those are very different epistemic positions.

Drafted policy

The pattern: an agent runs in shadow mode. The control plane watches every call it makes, classifies them by tool and argument shape, and proposes a baseline policy that would have allowed every call it actually made. A human reviews the proposal, tightens or rejects parts of it, and promotes.

This is not "let the agent self-define its scope." That would be a security disaster. The proposal is always a draft, and always requires a human to ship it. But the draft starts from data, not imagination — which means the human is reviewing a real document, not authoring one from a blank page.

The pattern has older relatives. OPA's opa eval --explain can show you which rules would have matched given a real input — useful for understanding existing policy. The shift here is using the same idea to author new policy, by aggregating across a window of observed traffic.

Why this works for security teams

Security teams have always had a bad option here: be the bottleneck, or be the rubber stamp. Either everyone waits for you, or you approve things you don't fully understand.

Drafted policy is a third option. The system does the work of articulating what the agent does. You do the work of deciding whether that's acceptable. You're not in the critical path of every change — you're in the critical path of every promotion.

That shifts the team from gatekeeping individual requests to governing the promotion flow. It's a more strategic posture, and it's the one that scales when the number of agents grows.

Where the prior art lives

Three reference points worth knowing:

Google's Zanzibar (paper, 2019) is the canonical example of relationship-based authorization at scale. It's not "drafted from observation," but its insight — that authorization is a system with consistency properties, not a config file — is the foundation any drafted-policy approach builds on.
AWS IAM Access Analyzer (docs) does something analogous in spirit: it generates least-privilege policies based on observed CloudTrail activity. It's a one-shot tool, not a continuous loop, but the shape is the same.
Cedar (open source) is the policy language AWS authored and open-sourced. It's a clean reference for what an authored policy should look like; the question of where the policy comes from is left open.

These all point in the same direction: the next iteration of authorization is one where the human is reviewing, not authoring.

What gets harder

Two things get harder, honestly.

One: the system has to be observant before the policy exists. That means letting the agent operate in a shadow mode where its calls are logged and evaluated but not blocked, with a clear understanding that this is a pre-production posture. Teams have to be comfortable with that gap.

Two: the policies you end up with don't always look like the policies you would have written. Drafted policy is often messier — it has special cases the author would have collapsed into a general rule. That messiness is actually the point: it captures behavior the imagined policy would have missed. But it requires reviewers willing to ship rules that don't fit a neat taxonomy.

The reframe

Stop thinking of policy as an artifact you produce and start thinking of it as a description of what the system does. The artifact becomes a byproduct of running the system honestly.

That sounds like a small reframe, but it changes how the security team spends its time — and that's the thing scaling agent fleets actually require.

End of post

Want to talk to us? Talk to founders or email the team.