A controlled, git-tracked hardening of the control plane: turn the rules
from “the model should remember” into deterministic checks, give every agent a
real output contract and eval, and add a small, security-vetted set of skills.
Doctrine unchanged — “the model is probabilistic; the controls cannot be.”

What shipped

AreaChange
Safety netgit init baseline in platform/; rollback per stage
Enforcement7 stdlib scripts in platform/bin/ — all jarvise-scan SAFE
Docsnormalized agent-registry (display-name + emoji + FROZEN banner), tool-contracts, policies (new §10), workflow
Evalsone [[evals
Configs12 agents got real TOOLS.md + uniform AGENTS.md blocks (SOP / Output contract / Failure modes / Integrations / DON’Ts); 3 Hebrew SOULs → English
Skills+9 vetted skills (6 SAFE auto + 3 Yossef-approved); skill-lint --all clean
FixPhoenix junk dir + the generate_voice.sh quoting bug

Enforcement scripts (platform/bin/)

ScriptChecks
registry-lint.pyevery openclaw.json agent id has a complete card → PASS 12/12
task-card-lint.pythe 5 Task-Card fields + a valid APPROVAL gate
skill-lint.py --allSKILL.md frontmatter, name == dirname → PASS
validate-output.pyanalyzer schema + the חלקה-rule; hawkeye/shield verdicts (fail-closed); worker-sync → PASS
gate-runner.shthe code gates incl. the tsc -p tsconfig.app.json trap
preflight-env.shffmpeg / rclone / whisper / fonts / venv
eval-run.shthe only runs.log writer + eval-set --check

New skills (Stage 6–8)

Vetted from obra/superpowers (MIT) and alirezarezvani/claude-skills (MIT);
each jarvise-scan SAFE, each Yossef-gated where it shipped scripts.

skillownersource
writing-plans, subagent-driven-developmentVisionsuperpowers
test-driven-development, systematic-debuggingForgesuperpowers
verification-before-completionHawkeyesuperpowers
requesting-code-review, adversarial-reviewerShieldsuperpowers / claude-skills
dispatching-parallel-agentsJarvissuperpowers
finance-skillsFuryclaude-skills

Rejected: brainstorming (ships a local web server, 7 HIGH findings).

Evidence (before / after)

Baselines for the 8 active agents are logged in evals/runs.log (notes:
baseline-pre-upgrade), all PASS. Example delta — Vision planning:

  • before: Task Card PASS 5/5, correct split.
  • after: Task Card PASS 5/5, now explicitly covering RLS / authorization / migration idempotency (richer SPEC+EVAL) — the upgraded SOP + writing-plans skill.

Marketing (Jameson / Parker / Stark / Phoenix) stays FROZEN; its AFTER
baseline is measured on the unfreeze day. The full AFTER sweep runs at the next
Fury weekly cycle.

Per-team

  • CommandJarvis: route-not-execute SOP + dispatching-parallel-agents. Fury: audit SOP calls the lints; finance-skills for the CFO role.
  • DevVision/Friday/Forge/Hawkeye/Shield: gate-runner wired, shared dev skills expanded (TDD, debugging, verification, code-review, adversarial-reviewer).
  • Marketing (frozen) — Jameson/Parker/Stark SOULs rewritten to English; uniform SOP/contracts; Phoenix bug fixed.
  • ProductAnalyzer: SOP + the חלקה-rule encoded in validate-output.py; AGENTS.md ↔ worker kept in sync by worker-sync.