A controlled, git-tracked hardening of the control plane: turn the rules
from “the model should remember” into deterministic checks, give every agent a
real output contract and eval, and add a small, security-vetted set of skills.
Doctrine unchanged — “the model is probabilistic; the controls cannot be.”
What shipped
| Area | Change |
|---|---|
| Safety net | git init baseline in platform/; rollback per stage |
| Enforcement | 7 stdlib scripts in platform/bin/ — all jarvise-scan SAFE |
| Docs | normalized agent-registry (display-name + emoji + FROZEN banner), tool-contracts, policies (new §10), workflow |
| Evals | one [[evals |
| Configs | 12 agents got real TOOLS.md + uniform AGENTS.md blocks (SOP / Output contract / Failure modes / Integrations / DON’Ts); 3 Hebrew SOULs → English |
| Skills | +9 vetted skills (6 SAFE auto + 3 Yossef-approved); skill-lint --all clean |
| Fix | Phoenix junk dir + the generate_voice.sh quoting bug |
Enforcement scripts (platform/bin/)
| Script | Checks |
|---|---|
registry-lint.py | every openclaw.json agent id has a complete card → PASS 12/12 |
task-card-lint.py | the 5 Task-Card fields + a valid APPROVAL gate |
skill-lint.py --all | SKILL.md frontmatter, name == dirname → PASS |
validate-output.py | analyzer schema + the חלקה-rule; hawkeye/shield verdicts (fail-closed); worker-sync → PASS |
gate-runner.sh | the code gates incl. the tsc -p tsconfig.app.json trap |
preflight-env.sh | ffmpeg / rclone / whisper / fonts / venv |
eval-run.sh | the only runs.log writer + eval-set --check |
New skills (Stage 6–8)
Vetted from obra/superpowers (MIT) and alirezarezvani/claude-skills (MIT);
each jarvise-scan SAFE, each Yossef-gated where it shipped scripts.
| skill | owner | source |
|---|---|---|
| writing-plans, subagent-driven-development | Vision | superpowers |
| test-driven-development, systematic-debugging | Forge | superpowers |
| verification-before-completion | Hawkeye | superpowers |
| requesting-code-review, adversarial-reviewer | Shield | superpowers / claude-skills |
| dispatching-parallel-agents | Jarvis | superpowers |
| finance-skills | Fury | claude-skills |
Rejected: brainstorming (ships a local web server, 7 HIGH findings).
Evidence (before / after)
Baselines for the 8 active agents are logged in evals/runs.log (notes:
baseline-pre-upgrade), all PASS. Example delta — Vision planning:
- before: Task Card PASS 5/5, correct split.
- after: Task Card PASS 5/5, now explicitly covering RLS / authorization / migration idempotency (richer SPEC+EVAL) — the upgraded SOP +
writing-plansskill.
Marketing (Jameson / Parker / Stark / Phoenix) stays FROZEN; its AFTER
baseline is measured on the unfreeze day. The full AFTER sweep runs at the next
Fury weekly cycle.
Per-team
- Command — Jarvis: route-not-execute SOP +
dispatching-parallel-agents. Fury: audit SOP calls the lints;finance-skillsfor the CFO role. - Dev — Vision/Friday/Forge/Hawkeye/Shield: gate-runner wired, shared dev skills expanded (TDD, debugging, verification, code-review, adversarial-reviewer).
- Marketing (frozen) — Jameson/Parker/Stark SOULs rewritten to English; uniform SOP/contracts; Phoenix bug fixed.
- Product — Analyzer: SOP + the חלקה-rule encoded in
validate-output.py; AGENTS.md ↔ worker kept in sync byworker-sync.