Grotto Reference — Pantheon TG Group Lifecycle & Watchdogs

Single source of truth for how GoldNet's agent fleet works in Telegram: how grottos are born, who may post what, how dispatches run, and the six watchdog layers that keep them alive. Mirrors canon:grotto-lifecycle-sot-2026-06-05.

v1.0 · 2026-06-05Sabour-approved structureauto-watchdog registration LIVEL6 silence watchdog LIVE

1 · Actors & identities

ActorTelegramUIDContainer / seatRole
HEPHAESTUS@HermesGoldNet_bot7845027880hephaestusPM — dispatch, lanes, tracker, deadlines
PROMETHEUS@PrometheusGoldNet_bot8834452529prometheusBackend / data / engineering (credential surfaces frozen per canon)
ATHENA@AthenaGoldNet_bot8997694628athenaQA, ratification, blind-protocol audits
APOLLO@ApolloGoldNet_bot8569500490apolloFrontend / UX / sites
THEO@TheoGoldNet_bot8823863666theoObservability, outside-voice sanity watch
Cowork-Claude@gaios_cowork_bot8628974026desktop orchestratorPosts everything: briefs, roll-calls, dispatches, watchdog escalations
Conductor@gaios_conductor1700284889MTProto userbotCreates groups + invites/promotes bots. Nothing else.
Sabour@SabAust117517459human principalWord overrides everything; auto-invited to every grotto

2 · Grotto birth — the only sanctioned path

/opt/gaios/bin/grotto-suite.py create \
    --title "Project Name" \
    --ministers theo,hephaestus,athena,apollo,prometheus \
    --extras @SabAust \
    --brief "pinned brief text"
What the tool does, in order (v2 + 2026-06-05 additions):
  • 1. Conductor creates the basic group + invites minister bots, cowork bot, Sabour, then promotes all bots to admin (single Telethon connection, 20s per-op watchdogs, 1.5s propagation sleeps).
  • 2. Patches every minister's channels.telegram.groups.<chat_id>.allowFrom=["*"] via openclaw config patch and restarts each container. This is the slow phase — budget ≥900s.
  • 3. HARD GATE: brief only posts when all patches succeeded — and it posts via the cowork bot (+pin), never the conductor.
  • 4. NEW: watchdog auto-registration — the chat_id is appended to /var/lib/fleet-l5/groups.json, instantly covered by L5 + L6. No grotto is born unwatched.
  • 5. Journal events written at every step (group_created, allowlists_patched, brief_posted, watchdog_registered, create_complete).

Partial-create recovery (roll-forward only — NEVER recreate)

3 · Comms rules (hard, canon-bound)

4 · Dispatch lifecycle & the exact phrases L6 watches

pinned BRIEF → 🟢 ROLL-CALL (≥4/5 READY) → "TASK DISPATCH" to PM
  → PM: "DISPATCH CONFIRMED" + numbered lanes, each owner @-mentioned, each with deadline
  → hourly tracker posts while lanes open
  → render-gate v2 BOTH tiers on any web deliverable → gpt-5.5 critic gate
  → ATHENA: "RATIFIED" (8-field template) → dispatch closed
L6 marker contract (PMs MUST use these exact phrases):
OPEN markers: TASK DISPATCH · DISPATCH CONFIRMED
CLOSE markers: RATIFIED · BENCH COMPLETE · ALL LANES COMPLETE · PROJECT CLOSED · DISPATCH CLOSED
The open-flag is sticky across scans until a CLOSE marker appears. While open, silence is escalated (see L6).

5 · Watchdog stack L1–L6 (all ACTIVE)

LayerFailure mode caughtMechanismThresholds / action
L1Container downdocker healthcheckcontinuous
L2openclaw unresponsivegateway pollcontinuous
L3TG ingress hungspool-handler health monitorcontinuous
L4text-without-tool-use (silent reply failure)inbound parser + auto-recoverper message
L5@-mention with no ackfleet_l5_watchdog.py · 2-min timer · conductor MTProto scan (80 msgs/group)ping @5m → openclaw doctor @10m → Sabour @20m
L5.5run stalled/aborted mid-flight — incl. spool handler timed out + surface_error failover (widened 2026-06-05)fleet_l55_stall_watchdog.py · scans container logssessions cleanup → report to the affected lane's group · per-session cooldown 60 min (2026-06-05, was 10 min)
L6open dispatch + group silence (ack-then-dormant, dropped updates, dead PMs)fleet_l6_silence_watchdog.py · 2-min timer · reads groups.jsontickle PM @30m → doctor + Sabour page @60m (cooldowns 30/60m)
Coverage: L5 + L6 read /var/lib/fleet-l5/groups.json each tick — auto-maintained by grotto-suite, so coverage = every grotto ever created from 2026-06-05 onward. L5.5 is container-wide (group-agnostic detection, lane-aware reporting).

History that built this: the DCA Bench stall (2026-06-05) went 4.8h unseen because the GROUPS list was hardcoded to two rooms, L5.5 grepped the wrong signature, and no silence layer existed. All three holes are closed; the incident is canon.

6 · Named failure modes (history-proven)

ModeWhat it looks likeCaught by / cure
Ack-then-dormant PMPM confirms dispatch, never posts the promised trackerL6 silence escalation; hourly-tracker rule
Spool-timeout update dropOpenClaw kills a 25-min handler, marks update FAILED, never redelivers — minister falls silent mid-taskL5.5 (widened signature) reports in-group; orchestrator re-tickles with fresh deadlines
Conductor-posted contentFleet ignores briefs/dispatches posted by the userbotForbidden by canon; grotto-suite posts brief via cowork bot
Bare-name tag"Heph please…" — nobody wakes@bot-handle rule; L5 only tracks real mention entities
Unwatched grottoNew group, zero watchdog coverageCLOSED — auto-registration at birth
HQ multi-step max-turnsLong fleet ops die at 30 agent turns / task timeoutSplit atomic; grotto create budget ≥900s; prefer gaios_exec for VPS work
L5.5 ack-loop (false stall)Long legit model call (bench/batch worker) trips OpenClaw's "stalled session" diagnostic every tick; minister has nothing to add, so the room fills with watchdog ping → ack → ping. PROM hit 5 cycles in DCOA before Rule 7 self-STOP.CLOSED — 60-min per-session cooldown in L5.5. Triage rule: if workspace artefacts are growing (e.g. results .jsonl row count rising), the session is BUSY, not stalled — never abort on diagnostic say-so alone.

7 · Ops cheatsheet

# create a grotto (auto-registers watchdogs)
/opt/gaios/bin/grotto-suite.py create --title "X" --ministers theo,hephaestus,athena,apollo,prometheus --extras @SabAust --brief "..."

# repair / extend an existing grotto (also auto-registers)
/opt/gaios/bin/grotto-suite.py whitelist-only --chat-id -494... --ministers ...

# watchdog state at a glance
cat /var/lib/fleet-l5/groups.json          # coverage
cat /var/lib/fleet-l5/l6_state.json        # per-room silence / open-dispatch flags
tail -20 /var/log/fleet-l5/watchdog.log    # L5 actions
tail -20 /var/lib/fleet-l5/l6_silence.log  # L6 actions
systemctl list-timers | grep fleet         # all watchdog timers

# manual PM tickle (cowork bot)
TOKEN=$(vault cowork-bot-token)
curl -s "https://api.telegram.org/bot$TOKEN/sendMessage" -d chat_id=-494... \
     -d text="@HermesGoldNet_bot status update now — all lanes report one line."

# read a grotto transcript (conductor)
# Telethon: session /opt/gaios/secrets/gaios_conductor.session, env tg_conductor.env

8 · Files & configs

PathWhat
/opt/gaios/bin/grotto-suite.pyGrotto lifecycle tool (create / whitelist-only / resume / manifest) + watchdog auto-registration
/opt/gaios/fleet/MINISTERS.jsonFleet topology SSOT (handles, tokens, containers, config paths)
/var/lib/fleet-l5/groups.jsonWatchdog group coverage (auto-maintained)
/opt/gaios/bin/fleet_l5_watchdog.py + .timerL5 mention→ack escalation
/opt/gaios/bin/fleet_l55_stall_watchdog.py + .timerL5.5 stalled/aborted-run detector
/opt/gaios/bin/fleet_l6_silence_watchdog.py + .timerL6 open-dispatch silence detector
/opt/gaios/secrets/gaios_conductor.session·tg_conductor.envConductor MTProto identity
vault cowork-bot-tokenCowork bot Bot-API token (vault alias)

9 · Canon index (authoritative cards)