Agent Determinism Upgrade

A controlled, git-tracked hardening of the control plane: turn the rules
from “the model should remember” into deterministic checks, give every agent a
real output contract and eval, and add a small, security-vetted set of skills.
Doctrine unchanged — “the model is probabilistic; the controls cannot be.”

What shipped

Area	Change
Safety net	`git init` baseline in `platform/`; rollback per stage
Enforcement	7 stdlib scripts in `platform/bin/` — all `jarvise-scan` SAFE
Docs	normalized agent-registry (display-name + emoji + FROZEN banner), `tool-contracts`, policies (new §10), workflow
Evals	one [[evals
Configs	12 agents got real `TOOLS.md` + uniform AGENTS.md blocks (SOP / Output contract / Failure modes / Integrations / DON’Ts); 3 Hebrew SOULs → English
Skills	+9 vetted skills (6 SAFE auto + 3 Yossef-approved); `skill-lint --all` clean
Fix	Phoenix junk dir + the `generate_voice.sh` quoting bug

Enforcement scripts (`platform/bin/`)

Script	Checks
`registry-lint.py`	every `openclaw.json` agent id has a complete card → PASS 12/12
`task-card-lint.py`	the 5 Task-Card fields + a valid APPROVAL gate
`skill-lint.py --all`	SKILL.md frontmatter, name == dirname → PASS
`validate-output.py`	analyzer schema + the חלקה-rule; hawkeye/shield verdicts (fail-closed); worker-sync → PASS
`gate-runner.sh`	the code gates incl. the `tsc -p tsconfig.app.json` trap
`preflight-env.sh`	ffmpeg / rclone / whisper / fonts / venv
`eval-run.sh`	the only `runs.log` writer + eval-set `--check`

New skills (Stage 6–8)

Vetted from obra/superpowers (MIT) and alirezarezvani/claude-skills (MIT);
each jarvise-scan SAFE, each Yossef-gated where it shipped scripts.

skill	owner	source
writing-plans, subagent-driven-development	Vision	superpowers
test-driven-development, systematic-debugging	Forge	superpowers
verification-before-completion	Hawkeye	superpowers
requesting-code-review, adversarial-reviewer	Shield	superpowers / claude-skills
dispatching-parallel-agents	Jarvis	superpowers
finance-skills	Fury	claude-skills

Rejected: brainstorming (ships a local web server, 7 HIGH findings).

Evidence (before / after)

Baselines for the 8 active agents are logged in evals/runs.log (notes:
baseline-pre-upgrade), all PASS. Example delta — Vision planning:

before: Task Card PASS 5/5, correct split.
after: Task Card PASS 5/5, now explicitly covering RLS / authorization / migration idempotency (richer SPEC+EVAL) — the upgraded SOP + writing-plans skill.

Marketing (Jameson / Parker / Stark / Phoenix) stays FROZEN; its AFTER
baseline is measured on the unfreeze day. The full AFTER sweep runs at the next
Fury weekly cycle.

Per-team

Command — Jarvis: route-not-execute SOP + dispatching-parallel-agents. Fury: audit SOP calls the lints; finance-skills for the CFO role.
Dev — Vision/Friday/Forge/Hawkeye/Shield: gate-runner wired, shared dev skills expanded (TDD, debugging, verification, code-review, adversarial-reviewer).
Marketing (frozen) — Jameson/Parker/Stark SOULs rewritten to English; uniform SOP/contracts; Phoenix bug fixed.
Product — Analyzer: SOP + the חלקה-rule encoded in validate-output.py; AGENTS.md ↔ worker kept in sync by worker-sync.

Agent Hub

Explorer

Agent Determinism Upgrade — June 2026

What shipped

Enforcement scripts (`platform/bin/`)

New skills (Stage 6–8)

Evidence (before / after)

Per-team

Graph View

Agent Hub

Explorer

Agent Determinism Upgrade — June 2026

What shipped

Enforcement scripts (platform/bin/)

New skills (Stage 6–8)

Evidence (before / after)

Per-team

Graph View

Enforcement scripts (`platform/bin/`)