The first native macOS autonomous multi-agent AI command system — built for operators, not researchers.
"Thrawn is the first system that treats AI agents as persistent team members with memory, identity, scheduled work, and real system access — running autonomously while you sleep."
Multi-agent AI is a crowded conversation. Most implementations are cloud-hosted SaaS platforms with vendor lock-in, or fragile Python scripts that require an engineer to maintain. Neither gives a solo founder or small team what they actually need: autonomous AI teammates that show up every day, remember what they learned, and get work done without babysitting.
Thrawn is a native macOS application — written entirely in Swift and SwiftUI — that runs a persistent multi-agent command system locally. It ships with a full agent fleet, a structured task board protocol, real shell execution capabilities, multi-provider LLM routing, and a self-improvement mechanism through persistent memory and accumulated skill files.
This document covers the technical architecture, the major evolutionary jumps the system has undergone, the novel design decisions that distinguish it from comparable efforts, and the philosophy behind the V2 agent model.
The current state of multi-agent AI tooling falls into three unsatisfying categories:
| Category | Examples | Persistent? | Local? | Real Shell Access? | Self-Improving? |
|---|---|---|---|---|---|
| Cloud SaaS agents | AutoGPT, CrewAI Cloud | Partial | ✗ | ✗ | ✗ |
| Python frameworks | LangGraph, CrewAI | ✗ | ✓ | Sandboxed | ✗ |
| IDE agents | Cursor, Copilot | ✗ | Partial | Code only | ✗ |
| OS-level assistants | Apple Intelligence | Partial | ✓ | ✗ | ✗ |
| Thrawn | — | ✓ Always-on | ✓ Native | ✓ Full | ✓ Memory + Skills |
The key gap: no existing tool treats agents as persistent entities with memory, identity, and scheduled autonomous work. Every session starts from zero. Thrawn doesn't.
Thrawn is a fully native Swift Package Manager project, compiled to a macOS .app bundle with ad-hoc code signing. No Electron. No Python runtime. No web server. The entire system runs as a single native process.
No Electron, no Python runtime, no web server. Swift Package Manager compiled to a signed macOS .app. This means low memory overhead, fast startup, and direct access to macOS APIs — Keychain, Contacts, Calendar — not just the web.
Every agent has a knowledge/ directory that survives restarts. Agents append facts.md entries, write skill files, and read them back on the next heartbeat. Learning accumulates continuously — not just within a session.
No cron, no external process. Swift timers fire every 30 seconds to check the schedule. Thrawn fires every 15 minutes; specialists every hour at their own offsets. The factory runs while you sleep.
Every agent resolves its model at heartbeat time: local → cheap → premium. If OpenAI is down, it falls back to Anthropic; if that's down, Ollama. Reliability beats model quality. Nothing silently fails.
When enabled, agents receive their bash output back and can execute real shell commands extracted from LLM responses. File writes, API calls, curl requests — agents do real work, not simulated work.
Agents never edit the task board directly. They write to agent-updates.json; the TaskDispatcher processes it and applies changes. This serializes all board mutations and prevents race conditions between agents.
Thrawn's ProviderRouter is a deliberate departure from how most systems handle LLM selection. Rather than locking an agent to a provider, the router resolves the best available option at the moment of the heartbeat — and always has a fallback.
This design means the system never fails silently. An agent might respond slower if degraded to Ollama, but it always responds. Reliability is the first-class constraint.
Per-agent model binding is stored in agent-specs.json as either .inherit (follows the standard loadout default) or .explicit(tier). Thrawn and Bart are explicitly set to .premium. The rest of the V1 squad inherits, which resolves to .local (Ollama), keeping costs near zero for routine operational work.
The task board is a Markdown file at ~/Library/Application Support/Thrawn/workspace/ops/TASK_BOARD.md. It is the single source of truth for all work in flight. Every agent reads it on every heartbeat. Only Thrawn puts tasks into Ready status. Specialists pick up tasks assigned to them.
"Ready is the only pickup lane. Thrawn is always the hub." — The core invariant of the task relay system.
Agents write their updates to agent-updates.json — a sidecar file that the TaskDispatcher processes asynchronously. This prevents any agent from corrupting the board directly. The dispatcher is the only writer to TASK_BOARD.md, making the system safe even with multiple concurrent heartbeats.
Thrawn has gone through several distinct jumps in capability. Each one represents a different class of problem solved — not an incremental improvement but a qualitative shift in what the system can do.
The core decision: build a native Swift app, not a script. Established the SwiftUI architecture, ObservableObject graph, and the fundamental loop: heartbeat → LLM → file write → board update. Chose Ollama for zero-cost local inference as the always-available backbone.
R2-D2 (Dev), C-3PO (Data), Qui-Gon (Research), Lando (Marketing), Boba Fett (QA). Each with its own heartbeat offset, identity file, heartbeat instructions, and knowledge directory. The factory came online — six agents running concurrently on independent schedules.
The most significant capability jump. In UNLEASHED mode, agents' bash command blocks are parsed from the LLM response and executed via ExecutionService. Results feed back into the next heartbeat. Agents stopped being chatbots and became real workers — writing files, calling APIs, running scripts.
Added AnthropicClient and GeminiAPIClient with the ProviderRouter fallback chain. Any agent can now be pinned to a tier (local/cheap/premium) without changing code — resolved at heartbeat time. This decoupled model selection from agent identity for the first time.
Full OpenAI SSE streaming integration. The ProviderBackend enum gained a .openai case; AgentScheduler gained the sendToActiveProvider switch branch; ThreadStore gained full provider-aware routing. Thrawn (lead) and V2 agents run on GPT-4.1. Specialists stay on free local Ollama. Cost is controlled by design.
ThreadStore — the direct chat system — was extended to route to OpenAI when configured, falling back to Ollama. Every Command thread is now running on the same brain as Thrawn's autonomous heartbeat. Consistent intelligence across interactive and autonomous modes.
A fundamental rethink of what an agent is. V1 = job description. V2 = distilled human archetype. Bart Simpson: smart ass, brilliant, multi-source web research from a single prompt, no methodology explanation. The architecture now supports personality-first agents with single-purpose superpowers running on premium models.
Agents now have 8-bit pixel art profile pictures on their cards — loaded from DiceBear's pixel-art sprite API, seeded by agent ID for consistency. Status jewel moves to overlay dot. When offline, falls back to a deterministic procedural pixel pattern. Identity is now visual.
V1 agents are job descriptions. V2 agents are people. The distinction sounds subtle but it changes everything about how you think about what to build next and what to give them to do.
Agent defined by role, responsibilities, outputs, escalation criteria. Complete. Formal. Built to cover a function exhaustively. Like a job posting.
Agent defined by personality, quirks, specific superpowers, and specific blind spots. Frank knows every coffee spot and never misses a day. Danny is a genius but a wildman — you can't count on him. Brenda keeps Danny in line. Built like a real team, not a org chart.
The insight is that specialization through personality is more powerful than specialization through job function. A V2 agent doesn't need to be capable of everything in its domain — it needs to be extraordinary at one thing, while the constraints (personality, reliability, style) are a feature, not a bug.
Low breadth is intentional. A V2 agent that does one thing extraordinarily well and has a distinct voice is more valuable than a generalist that covers everything adequately.
Most AI agent systems are stateless — every session starts fresh. Thrawn has two mechanisms for persistent learning:
Memory (facts.md) — A persistent Markdown file each agent appends to. User preferences, project context, routing decisions that worked. Specific, dated, accumulating. Injected into the next heartbeat prompt so the agent starts where it left off.
Skill Files — When an agent solves something complex, it writes a skill file: the exact procedure, gotchas, when to use it. Next session, that skill is injected back. The agent doesn't re-learn; it references. This is the closest thing to genuine institutional memory in current AI systems.
Multi-agent frameworks are not new. What's new here is the combination — and the platform decision.
Writing a multi-agent system in Swift for macOS is unusual to the point of being novel. It means zero Python dependency, direct Keychain access, macOS-native scheduling, and the ability to eventually access Calendar, Contacts, Photos — surfaces no web-based agent can touch.
The AgentSpecStore resolves which model tier each agent uses based on its identity, not the request. Thrawn is always premium. Specialists are always local. The cost structure is baked into the agent design, not decided at call time.
Agents never write the task board directly. They write a sidecar update file. The dispatcher serializes all mutations. This is a safety pattern borrowed from distributed systems applied to a local multi-agent problem — and it works.
V2 agents are defined by who they are, not what they do. This is a meaningfully different mental model that produces better prompts, more consistent behavior, and a more intuitive interface for operators adding new agents.
The system is designed to run while the operator is away — overnight, across meetings, on weekends. No polling required. Heartbeats fire from Swift timers. UNLEASHED execution happens automatically. The factory never stops.
Each heartbeat is slightly smarter than the last because agents append to memory and skills. The value of the system increases over time — not because the models improve, but because the context does. This is a compounding moat.
The current system is functional, battle-tested, and running autonomously. The architectural foundation supports expansions that most comparable systems would require a complete rewrite to achieve.
Screen capture integration is already in ScreenCaptureStore. Vision-capable agents that can read and act on what's on screen are a single provider + prompt extension away.
Native macOS app means direct entitlement access to Calendar, Contacts, Reminders. Agents that know your schedule and people — without any OAuth dance.
Current architecture is hub-and-spoke through the task board. Direct agent handoffs — R2-D2 asking Bart to research something mid-task — are a natural next step.
The existing thread system already handles async messaging. A voice layer that converts speech to thread messages and reads responses is architecturally straightforward.
UNLEASHED + AppleScript/Accessibility APIs gives agents the ability to drive Safari and Chrome. Real browser control without third-party dependencies.
V2's personality-first agent design is inherently shareable. An agent is an identity file + heartbeat file + spec entry. Distributable. Installable. A community of archetypes.
"Everyone is building agents. Very few are building teammates."
The dominant paradigm in AI tooling today is session-based, request-response, UI-first. Thrawn makes the opposite bets: persistence over sessions, scheduled over reactive, native over web, personality over function, local over cloud-first.
None of these bets are obviously correct for a mass market. All of them are obviously correct for the operator — the founder, the solo executive, the small team — who needs AI teammates that show up, remember, and work autonomously. That person doesn't need another chatbot. They need a factory.
Thrawn is that factory. And it's running right now.