Tool runtime config: policy, hooks, and context budgets

I shipped a chunk of infrastructure in Zeus this week that I have wanted for a long time: a runtime config layer for tools.

If you have ever built an agent loop, you know the failure mode: you add tools, you add providers, you add evals, and pretty soon the tool layer is a hairball of one-off flags.

This diff is the opposite move. It is a single place to answer:

Which tools are allowed for this run? (globally, and per provider)
How big are tool results allowed to be? (and how do we cap them)
Can I inject small bits of behavior around model selection, prompt building, or tool execution?

This post is a diff tour of what changed and why.

The problem: tool sprawl without a runtime contract

In early versions of Zeus, tools were mostly static. The loop handed the model a fixed list of tool schemas and hoped for the best.

That breaks down once you need any of these:

a “minimal” profile for constrained runs (readonly-ish debugging)
a provider-specific tool subset (some models are better with fewer tools)
a way to turn on hooks for experiments without rebuilding the binary
predictable budgets for tool outputs so the prompt does not explode

The diff adds three files that together act like a runtime contract for tools:

llm/tool_runtime_config.py
llm/tool_policy_pipeline.py
llm/tool_hook_pipeline.py

And then it wires them into the loop and tool dispatcher.

1) Tool runtime config: defaults + normalization

The core is tool_runtime_config.py. It defines a default config and a normalizer that is deliberately strict. If the config is missing or malformed, it snaps back to defaults.

A few things I like about this design:

A. There is a default per-tool envelope

A tool like shell has a natural set of runtime tunables:

"shell": {
  "max_result_chars": 40_000,
  "timeout_sec": 20,
  "security": "full",
  "ask": "off",
  "yield_ms": 10_000,
  "safe_bins": [],
}

Same for web_fetch (timeouts, redirect limits, cache TTL, readability) and for read and apply_patch (input limits).

The important bit is not the exact numbers. It is that every tool now has a place where its runtime knobs live.

B. Normalization is copy-on-write, with clamping

The normalizer does a deep copy of defaults, then merges user-supplied overrides, but clamps obvious footguns:

integer fields have min/max bounds
string lists are de-duplicated
hooks are normalized into dict entries

This sounds boring, but it is the difference between a “config file” and a config system you can actually ship.

2) Tool policy pipeline: filtering the tool set

tool_policy_pipeline.py implements a small but crucial feature: take a list of tool schemas, and filter it down according to a policy block.

There are two layers of policy:

Global policy
Provider override (eg “grok gets minimal tools, claude gets full tools”)

The code resolves an available tool name set from the schemas, applies:

profile (eg minimal)
allow (hard set)
also_allow (union)
deny (subtract)

Then returns a filtered schema list plus some debug metadata.

The “minimal” profile is explicit

Right now the predefined toolsets are intentionally small:

"minimal": {
  "read",
  "file_search",
  "web_fetch",
  "web_search",
  "memory",
  "knowledge",
  "ask_user",
}

This is not security. It is operational: sometimes you want a run that cannot mutate anything. (And it is a clean way to reproduce failures in evals.)

3) Hook pipeline: experiment without forking the loop

The other half of “runtime control” is hooks.

tool_hook_pipeline.py adds four phases:

before_model_resolve (provider or model selection)
before_prompt_build (system prompt patching)
before_tool_call (block tool, merge input, inject context)
after_tool_call (append a note, set a field, truncate)

Hooks are intentionally non-fatal. If a hook throws, Zeus records a warning and keeps going.

A couple of patterns that already feel useful:

A. Provider forcing

You can force a provider/model from config without touching code:

set_provider
set_model
set_provider_and_model

This is the kind of thing you want for eval harnesses and A/B testing.

B. Tool blocking at runtime

In before_tool_call, you can block a specific tool (or all tools) with a message:

block_tool

This gives you a way to temporarily “pull the plug” on a tool in production while keeping the rest of the agent usable.

C. Light input patching

merge_input is a small hammer that goes a long way. You can inject a timeout, change a parameter, or nudge a tool into a safer mode.

The agent loop also got a concrete improvement: tool result sizes are now capped by a mix of:

per-tool max_result_chars
and a global result_context_share_ratio

The logic (in loop.py) is basically:

take the tool’s hard cap
compute a share of the model’s context window
clamp the share between a minimum and the hard cap

This is the difference between “the agent sometimes goes off the rails when rg returns 200k chars” and “tool outputs are predictable”.

I defaulted the ratio to ~0.22 and the minimum to 1200 chars. The specific numbers are easy to tune once you have the mechanism.

5) Why this matters

This diff is not flashy. But it unlocks a bunch of engineering moves:

ship new tools without adding ad-hoc config flags
run evals under strict tool profiles
do provider-specific toolsets without branching schemas
patch system prompts in a controlled way
keep the loop stable under large outputs

It is also a step toward a clean separation:

the loop is the loop
the tools are the tools
the runtime config defines the envelope they run inside

That separation is what makes an agent system maintainable.

Appendix: the three new files

apps/client-mac/zeus/Resources/server/llm/tool_runtime_config.py
apps/client-mac/zeus/Resources/server/llm/tool_policy_pipeline.py
apps/client-mac/zeus/Resources/server/llm/tool_hook_pipeline.py

If you want to play with this yourself, the implementation is small enough to read end-to-end.