Thrawn

01 · Executive Summary

An AI workforce. Not a chatbot.

"Thrawn is the first system that treats AI agents as persistent team members with memory, identity, scheduled work, and real system access — running autonomously while you sleep."

Multi-agent AI is a crowded conversation. Most implementations are cloud-hosted SaaS platforms with vendor lock-in, or fragile Python scripts that require an engineer to maintain. Neither gives a solo founder or small team what they actually need: autonomous AI teammates that show up every day, remember what they learned, and get work done without babysitting.

Thrawn is a native macOS application — written entirely in Swift and SwiftUI — that runs a persistent multi-agent command system locally. It ships with a full agent fleet, a structured task board protocol, real shell execution capabilities, multi-provider LLM routing, and a self-improvement mechanism through persistent memory and accumulated skill files.

This document covers the technical architecture, the major evolutionary jumps the system has undergone, the novel design decisions that distinguish it from comparable efforts, and the philosophy behind the V2 agent model.

7

Active Agents

15m

Lead Heartbeat

3

LLM Providers

∞

Runs Unattended

02 · The Problem

Why existing tools don't cut it

The current state of multi-agent AI tooling falls into three unsatisfying categories:

Category	Examples	Persistent?	Local?	Real Shell Access?	Self-Improving?
Cloud SaaS agents	AutoGPT, CrewAI Cloud	Partial	✗	✗	✗
Python frameworks	LangGraph, CrewAI	✗	✓	Sandboxed	✗
IDE agents	Cursor, Copilot	✗	Partial	Code only	✗
OS-level assistants	Apple Intelligence	Partial	✓	✗	✗
Thrawn	—	✓ Always-on	✓ Native	✓ Full	✓ Memory + Skills

The key gap: no existing tool treats agents as persistent entities with memory, identity, and scheduled autonomous work. Every session starts from zero. Thrawn doesn't.

03 · Architecture

How it's built

Thrawn is a fully native Swift Package Manager project, compiled to a macOS .app bundle with ad-hoc code signing. No Electron. No Python runtime. No web server. The entire system runs as a single native process.

System Architecture — Layer View

Key Architectural Decisions

🍎

100% Native Swift

No Electron, no Python runtime, no web server. Swift Package Manager compiled to a signed macOS .app. This means low memory overhead, fast startup, and direct access to macOS APIs — Keychain, Contacts, Calendar — not just the web.

💾

Persistent File-Based Memory

Every agent has a knowledge/ directory that survives restarts. Agents append facts.md entries, write skill files, and read them back on the next heartbeat. Learning accumulates continuously — not just within a session.

⏰

Timer-Driven Heartbeats

No cron, no external process. Swift timers fire every 30 seconds to check the schedule. Thrawn fires every 15 minutes; specialists every hour at their own offsets. The factory runs while you sleep.

🔀

Provider Router with Fallback Chain

Every agent resolves its model at heartbeat time: local → cheap → premium. If OpenAI is down, it falls back to Anthropic; if that's down, Ollama. Reliability beats model quality. Nothing silently fails.

🔓

UNLEASHED Mode

When enabled, agents receive their bash output back and can execute real shell commands extracted from LLM responses. File writes, API calls, curl requests — agents do real work, not simulated work.

📋

Structured Task Board Protocol

Agents never edit the task board directly. They write to agent-updates.json; the TaskDispatcher processes it and applies changes. This serializes all board mutations and prevents race conditions between agents.

04 · Provider Routing

The model doesn't matter. Availability does.

Thrawn's ProviderRouter is a deliberate departure from how most systems handle LLM selection. Rather than locking an agent to a provider, the router resolves the best available option at the moment of the heartbeat — and always has a fallback.

Provider Resolution — Decision Flow

This design means the system never fails silently. An agent might respond slower if degraded to Ollama, but it always responds. Reliability is the first-class constraint.

Per-agent model binding is stored in agent-specs.json as either .inherit (follows the standard loadout default) or .explicit(tier). Thrawn and Bart are explicitly set to .premium. The rest of the V1 squad inherits, which resolves to .local (Ollama), keeping costs near zero for routine operational work.

05 · The Task Board Protocol

The factory floor

The task board is a Markdown file at ~/Library/Application Support/Thrawn/workspace/ops/TASK_BOARD.md. It is the single source of truth for all work in flight. Every agent reads it on every heartbeat. Only Thrawn puts tasks into Ready status. Specialists pick up tasks assigned to them.

Task Lifecycle — Owner/Status State Machine

"Ready is the only pickup lane. Thrawn is always the hub." — The core invariant of the task relay system.

Agents write their updates to agent-updates.json — a sidecar file that the TaskDispatcher processes asynchronously. This prevents any agent from corrupting the board directly. The dispatcher is the only writer to TASK_BOARD.md, making the system safe even with multiple concurrent heartbeats.

06 · Evolution

How we got here

Thrawn has gone through several distinct jumps in capability. Each one represents a different class of problem solved — not an incremental improvement but a qualitative shift in what the system can do.

Origin

Architecture

Native macOS App · Ollama-Only

The core decision: build a native Swift app, not a script. Established the SwiftUI architecture, ObservableObject graph, and the fundamental loop: heartbeat → LLM → file write → board update. Chose Ollama for zero-cost local inference as the always-available backbone.

V1 Squad

Agents

Full Dev-Ops Squad Deployed

R2-D2 (Dev), C-3PO (Data), Qui-Gon (Research), Lando (Marketing), Boba Fett (QA). Each with its own heartbeat offset, identity file, heartbeat instructions, and knowledge directory. The factory came online — six agents running concurrently on independent schedules.

Milestone

Security

UNLEASHED Mode · Real Shell Execution

The most significant capability jump. In UNLEASHED mode, agents' bash command blocks are parsed from the LLM response and executed via ExecutionService. Results feed back into the next heartbeat. Agents stopped being chatbots and became real workers — writing files, calling APIs, running scripts.

Routing

Model

Multi-Provider Router · Anthropic + Gemini

Added AnthropicClient and GeminiAPIClient with the ProviderRouter fallback chain. Any agent can now be pinned to a tier (local/cheap/premium) without changing code — resolved at heartbeat time. This decoupled model selection from agent identity for the first time.

Premium

Model

OpenAI GPT-4.1 · Thrawn Goes Premium

Full OpenAI SSE streaming integration. The ProviderBackend enum gained a .openai case; AgentScheduler gained the sendToActiveProvider switch branch; ThreadStore gained full provider-aware routing. Thrawn (lead) and V2 agents run on GPT-4.1. Specialists stay on free local Ollama. Cost is controlled by design.

Thread

UX

Command Tab Routes to GPT-4.1

ThreadStore — the direct chat system — was extended to route to OpenAI when configured, falling back to Ollama. Every Command thread is now running on the same brain as Thrawn's autonomous heartbeat. Consistent intelligence across interactive and autonomous modes.

V2

Agents

V2 Agent Philosophy · Bart Deploys

A fundamental rethink of what an agent is. V1 = job description. V2 = distilled human archetype. Bart Simpson: smart ass, brilliant, multi-source web research from a single prompt, no methodology explanation. The architecture now supports personality-first agents with single-purpose superpowers running on premium models.

Identity

UX

Pixel Art Agent Profiles

Agents now have 8-bit pixel art profile pictures on their cards — loaded from DiceBear's pixel-art sprite API, seeded by agent ID for consistency. Status jewel moves to overlay dot. When offline, falls back to a deterministic procedural pixel pattern. Identity is now visual.

07 · The Fleet

Who's running

Heartbeat Schedule — One Hour Window

🎯

Lead · Premium · :00/:15/:30/:45

⚙️

R2-D2

Dev · Local · :10

🗂️

C-3PO

Data & API · Local · :20

🔍

Qui-Gon

Research · Local · :30

✍️

Lando Calrissian

Marketing · Local · :40

🎯

Boba Fett

QA & Recon · Local · :50

😈

Bart ★

Research & Intel · GPT-4.1 · :15

08 · V2 Agent Philosophy

Real people, distilled

V1 agents are job descriptions. V2 agents are people. The distinction sounds subtle but it changes everything about how you think about what to build next and what to give them to do.

V1 — Job Description Model

Agent defined by role, responsibilities, outputs, escalation criteria. Complete. Formal. Built to cover a function exhaustively. Like a job posting.

V2 — Human Archetype Model

Agent defined by personality, quirks, specific superpowers, and specific blind spots. Frank knows every coffee spot and never misses a day. Danny is a genius but a wildman — you can't count on him. Brenda keeps Danny in line. Built like a real team, not a org chart.

The insight is that specialization through personality is more powerful than specialization through job function. A V2 agent doesn't need to be capable of everything in its domain — it needs to be extraordinary at one thing, while the constraints (personality, reliability, style) are a feature, not a bug.

V2 Agent Characteristics Spectrum

Reliability

95%

Specialization

88%

Personality Depth

92%

Breadth of Role

40%

Low breadth is intentional. A V2 agent that does one thing extraordinarily well and has a distinct voice is more valuable than a generalist that covers everything adequately.

09 · Self-Improvement

Agents that get smarter over time

Most AI agent systems are stateless — every session starts fresh. Thrawn has two mechanisms for persistent learning:

Continuous Learning Loop

Memory (facts.md) — A persistent Markdown file each agent appends to. User preferences, project context, routing decisions that worked. Specific, dated, accumulating. Injected into the next heartbeat prompt so the agent starts where it left off.

Skill Files — When an agent solves something complex, it writes a skill file: the exact procedure, gotchas, when to use it. Next session, that skill is injected back. The agent doesn't re-learn; it references. This is the closest thing to genuine institutional memory in current AI systems.

10 · What's Novel

Why this is different

Multi-agent frameworks are not new. What's new here is the combination — and the platform decision.

🏠

Native First, Always

Writing a multi-agent system in Swift for macOS is unusual to the point of being novel. It means zero Python dependency, direct Keychain access, macOS-native scheduling, and the ability to eventually access Calendar, Contacts, Photos — surfaces no web-based agent can touch.

🧠

Identity-Based Routing

The AgentSpecStore resolves which model tier each agent uses based on its identity, not the request. Thrawn is always premium. Specialists are always local. The cost structure is baked into the agent design, not decided at call time.

🔄

Separation of Board and Mutation

Agents never write the task board directly. They write a sidecar update file. The dispatcher serializes all mutations. This is a safety pattern borrowed from distributed systems applied to a local multi-agent problem — and it works.

👥

Personality-First Agent Design

V2 agents are defined by who they are, not what they do. This is a meaningfully different mental model that produces better prompts, more consistent behavior, and a more intuitive interface for operators adding new agents.

💤

Unattended Operation

The system is designed to run while the operator is away — overnight, across meetings, on weekends. No polling required. Heartbeats fire from Swift timers. UNLEASHED execution happens automatically. The factory never stops.

📈

Accumulating Intelligence

Each heartbeat is slightly smarter than the last because agents append to memory and skills. The value of the system increases over time — not because the models improve, but because the context does. This is a compounding moat.

11 · Horizon

Where this goes

The current system is functional, battle-tested, and running autonomously. The architectural foundation supports expansions that most comparable systems would require a complete rewrite to achieve.

👁️

Vision Agents

Screen capture integration is already in ScreenCaptureStore. Vision-capable agents that can read and act on what's on screen are a single provider + prompt extension away.

📅

Calendar / Contacts Integration

Native macOS app means direct entitlement access to Calendar, Contacts, Reminders. Agents that know your schedule and people — without any OAuth dance.

🤝

Agent-to-Agent Communication

Current architecture is hub-and-spoke through the task board. Direct agent handoffs — R2-D2 asking Bart to research something mid-task — are a natural next step.

🎙️

Voice Interface

The existing thread system already handles async messaging. A voice layer that converts speech to thread messages and reads responses is architecturally straightforward.

🌐

Browser Automation

UNLEASHED + AppleScript/Accessibility APIs gives agents the ability to drive Safari and Chrome. Real browser control without third-party dependencies.

🏪

Agent Marketplace

V2's personality-first agent design is inherently shareable. An agent is an identity file + heartbeat file + spec entry. Distributable. Installable. A community of archetypes.

12 · Conclusion

A different bet

"Everyone is building agents. Very few are building teammates."

The dominant paradigm in AI tooling today is session-based, request-response, UI-first. Thrawn makes the opposite bets: persistence over sessions, scheduled over reactive, native over web, personality over function, local over cloud-first.

None of these bets are obviously correct for a mass market. All of them are obviously correct for the operator — the founder, the solo executive, the small team — who needs AI teammates that show up, remember, and work autonomously. That person doesn't need another chatbot. They need a factory.

Thrawn is that factory. And it's running right now.

✓

Always Running

∞

Sessions of Memory

V2

Agent Model Live

0

External Dependencies

THRAWN

An AI workforce. Not a chatbot.

Why existing tools don't cut it

How it's built

Key Architectural Decisions

100% Native Swift

Persistent File-Based Memory

Timer-Driven Heartbeats

Provider Router with Fallback Chain

UNLEASHED Mode

Structured Task Board Protocol

The model doesn't matter. Availability does.

The factory floor

How we got here

Native macOS App · Ollama-Only

Full Dev-Ops Squad Deployed

UNLEASHED Mode · Real Shell Execution

Multi-Provider Router · Anthropic + Gemini

OpenAI GPT-4.1 · Thrawn Goes Premium

Command Tab Routes to GPT-4.1

V2 Agent Philosophy · Bart Deploys

Pixel Art Agent Profiles

Who's running

Thrawn

R2-D2

C-3PO

Qui-Gon

Lando Calrissian

Boba Fett

Bart ★

Real people, distilled

V1 — Job Description Model

V2 — Human Archetype Model

Agents that get smarter over time

Why this is different

Native First, Always

Identity-Based Routing

Separation of Board and Mutation

Personality-First Agent Design

Unattended Operation

Accumulating Intelligence

Where this goes

Vision Agents

Calendar / Contacts Integration

Agent-to-Agent Communication

Voice Interface

Browser Automation

Agent Marketplace

A different bet