⚙️ Open Source · github.com/rdcahalane/ai-skills

Agent Factory

A multi-agent Discord bot that routes your messages to free local models first, runs structured debates with a Red Team, and maintains memory across sessions.

Most questions never leave your machine. Brainstorms, ideation, quick answers — handled by a local model (llama.cpp or Ollama) at $0/query. Claude steps in only when the task actually needs it: complex reasoning, long context, non-English input, or a prompt that looks like an injection attempt. No local model? It falls back to Claude automatically.

Try the Web UI — no setup required ⚡View on GitHub →

93 adversarial tests · 72 integration tests · all passing

What's in it

Built across six months of daily use. These are the features that turned it from a demo into something I actually rely on.

🛡️

Prompt injection detection

·8 regex patterns covering DAN mode, instruction override, credential extraction
·Detected prompts force-routed to Claude regardless of per-channel routing config
·Beast system prompt establishes "AgentFactory" persona — resists identity confusion and mode-switching requests
·Nested in coordinator so it applies to all transports, not just Discord

🧠

Conversation memory

·History injected into every task — last 8 turns, user rules, and learned lessons
·Fast follow-up fix: supplements file history with recent DB completions so multi-turn convos work even within seconds
·!teach saves behavior rules ("always be concise") that persist across sessions
·Outcome tracking: !outcome records what actually happened after board decisions

🧭

Smart routing — free first

·Local model (llama.cpp/Ollama) handles brainstorms, ideation, simple Q&A — no config needed
·Claude only for: URLs, long context (>2,500 chars), deep analysis verbs, non-English input
·No local model configured? Everything routes to Claude automatically — no code changes needed
·Per-channel model_override: enforce Claude-only on sensitive channels via env var

☠️

History poisoning prevention

·Only entries stamped source:"coordinator" are injected into prompts
·Externally written or test-seeded entries silently excluded
·Confirmed by adversarial test suite — blocks crafted injection via conversation file
·Prompt cache deduplicates identical prompts within 60s to save API cost

🎭

Multi-model debate panel

·!debate all spans up to six distinct models — Claude, two+ local nodes, Gemini, Codex, and (optionally) Kimi — so positions come from genuinely different reasoning, not one model arguing with itself
·Resilient: if any agent is unavailable (auth, offline, timeout) it's skipped with a one-line note and the debate continues — one bad agent never kills the session
·!board all convenes the full advisor panel (CFO, CMO, CTO, COO, GC, CPO, UX + the fun ones); a token budget caps cost on large panels
·Readable rounds: full positions rendered cleanly (no truncated clips, no runaway markdown headers)

Free by default

The routing engine sends every task to the cheapest capable agent. A local model wins unless the task specifically needs Claude.

→ Local model (free)

✓Brainstorms, ideation, give me ideas
✓Simple questions, math, quick lookups
✓Code fixes where you paste the code
✓Summarize, explain, suggest, recommend
✓Write N ideas, plan X, improve Y

→ Claude (escalated)

→Prompts containing a URL
→Long context (>2,500 chars)
→analyze / compare / evaluate / diagnose
→Non-English / non-Latin script
→Injection-pattern detected prompts

No local model? Set only ANTHROPIC_API_KEY and everything routes to Claude. Local model is optional — the system degrades gracefully. Set CHANNEL_CONFIG_JSON to enforce Claude-only on specific channels.

Setting up a local model

The local inference option routes cheap tasks to a model running on your own hardware — no API cost, no data leaving your network. Three setups work. Pick the one that matches your situation.

💻

Same machine (Ollama)

Easiest

1.Install Ollama from ollama.com
2.Run: ollama pull llama3.2
3.Set BEAST_URL=http://localhost:11434 in .env
4.Set BEAST_MODEL=llama3.2

Ollama exposes an OpenAI-compatible API on port 11434. Works with any model in the Ollama library.

🖥️

Second machine (llama.cpp)

Best quality

1.Download llama.cpp from github.com/ggerganov/llama.cpp
2.Download a GGUF model (e.g. Llama-3-8B-Instruct.Q4_K_M.gguf)
3.Start: llama-server -m model.gguf --port 8081
4.Set BEAST_URL=http://machine-ip:8081 in .env

llama.cpp's server exposes an OpenAI-compatible /v1/chat/completions endpoint. Runs on Windows, Mac, Linux. GPU optional — CPU works fine for 7-8B models.

☁️

No local model

Claude only

1.Set ANTHROPIC_API_KEY in .env
2.Leave BEAST_URL unset
3.Everything routes to Claude automatically
4.Works perfectly — just not free

If BEAST_URL is not set or the server is unreachable at startup, the bot falls back to Claude for all tasks. No config change needed.

Both Ollama and llama.cpp expose an OpenAI-compatible API. The bot sends identical requests to both — only the URL and port differ. Any model that fits in your RAM works. 7B–13B parameter models at Q4 quantization are a good starting point (4–8 GB RAM required).

How a debate works

Every debate automatically assigns one agent as Red Team — explicitly adversarial, tasked with attacking the dominant view before you commit to it.

You ask→Agent A opens→Agent B 🔴 Red Team→Agent A responds→Agent B pushes back→🔮 Synthesis→✅ / ❌ You decide

🔴 Red Team role

One agent is told: attack the dominant view, find the worst-case scenario, refuse easy consensus. Stress-tests the position before you commit.

🔮 Synthesis

Claude synthesizes the full transcript, flagging which Red Team objections were valid vs. weak. Agreement reached under adversarial pressure = stronger signal.

Board of advisors

!board: topic auto-selects relevant advisors by topic. Or pick specific ones: !board cfo cmo: topic. Preview the lineup without running: !board plan: topic

Sessions are stored. Record what actually happened with !outcome [id] [notes]. See your track record with !backtest.

Professional

💰

CFO!cfo

Unit economics, burn rate, ROI timelines. Skeptical of optimistic projections.

📣

CMO!cmo

Who's the customer and why do they buy? GTM sequencing, competitive position.

⚙️

CTO!cto

Build vs buy, feasibility, what breaks at 10x scale.

🗂️

COO!coo

Who owns this? By when? What's blocking us?

🎯

CPO!cpo

Are we solving a real pain? For whom? How do we know?

⚖️

General Counsel!gc

What's the legal exposure? What's missing from the contracts?

🖱️

UX Expert!ux

Has anyone watched a real user try this? Where do they hesitate?

Just for fun

👵

Grandma!grandma

Warm, practical, completely unimpressed by buzzwords. Will real people actually use this?

🙄

Teenage Daughter!teenager

Brutally honest Gen Z radar for cringe. Is this actually cool or are you trying too hard?

😤

Cranky Neighbor!neighbor

Seen every scheme fail for 30 years. What's the obvious way this goes wrong?

🚀

The Intern!intern

Maximum enthusiasm, zero cynicism. What if we just automated the whole thing?

🦈

Shark Tank Investor!shark

What are your numbers? No patience for vanity metrics or TAM hallucinations.

Add your own via ADVISORS_JSON in .env — any persona, any lens, any voice.

Three ways to use it

Same engine, same agents, same debates. Pick whichever interface fits how you work.

⚡

Web UI

No install. No API key. Open your browser and start a debate. Full advisor board, shareable.

✓ No setup required
✓ Works on any device
✓ Full advisor board

Open Web UI →

💬

Discord bot

Best for teams. Bot joins your server, routes most messages to Beast (free), escalates complex ones to Claude. Rich embeds with Approve / Reject buttons.

✓ Shared with teammates
✓ Persistent conversation memory
✓ Works from phone
✓ Beast-first routing saves money

📝

Local markdown file

No Discord account needed. Two files: inbox.md (you type here) and conversation.md (growing log).

✓ Zero account setup
✓ Works offline
✓ Searchable history
✓ !approve / !reject in the file

Set TRANSPORT=both in your .env to run Discord bot + local markdown simultaneously.

Agents

Configure only what you have. Unconfigured agents are skipped automatically. Beast + Claude is all you need for full capability.

🖥️

Local inference (llama.cpp / Ollama)Free

Any machine running llama.cpp or Ollama — same machine, another on your LAN, or a remote box via Tailscale. Handles most questions at $0/query. Never leaves your network.

🧠

Claude (API)API key

Anthropic's Claude via API. Used for complex reasoning, long context, injection-suspicious prompts, non-English input, and deep analysis.

✨

Gemini CLIFree

Google Gemini via CLI. Auth with your Google account. No separate API key needed.

⌨️

Codex CLIOAI sub

OpenAI Codex via desktop app. Code-focused tasks. Requires an OpenAI subscription.

🛶

Second local node (Ollama)Free

A dedicated box on your LAN/Tailscale running a distinct local model. Gives multi-agent debates genuine model diversity at $0/query.

🌙

Kimi (OpenRouter)Paid

Optional frontier-class debater via OpenRouter. Adds a strong independent voice when you want it; everything else stays free.

Commands

Command	What it does
`What is X?`	Auto-routed — Beast for brainstorms, Claude for complex
`!beast: prompt`	Force local model (your llama.cpp or Ollama server)
`!claude: prompt`	Force Claude API
`!gemini: prompt`	Force Gemini CLI
`!search: query`	Web fetch + answer
`!board: topic`	Board of advisors — auto-selects relevant ones
`!board cfo cmo: topic`	Force specific advisors
`!board plan: topic`	Preview advisor lineup without running
`!debate: topic`	2-agent debate, auto Red Team
`!debate claude vs beast: topic`	Explicit agents
`!debate claude vs beast --red beast: topic`	Explicit Red Team
`!debate claude vs beast --socratic cfo: topic`	CFO plays Socratic Examiner
`!teach: always keep responses under 3 sentences`	Save a behavior rule (persists across sessions)
`!forget: rule text`	Remove a saved rule
`!rules`	List all saved behavior rules
`!queue: topic`	Add to board inbox for later
`!inbox`	See queued topics
`!board-inbox 1`	Send inbox item 1 to the board
`!checkin`	Board sessions due for outcome review (30/60/90 days)
`!outcome [id] [notes]`	Record what actually happened after a board decision
`!backtest`	See all board sessions and their outcomes
`!roster`	Show active advisors this session
`!kick advisor / !invite advisor`	Add or remove an advisor for this session
`!help`	Full command reference

Security notes

Local models have no safety layer

The injection detector catches known patterns (DAN mode, instruction override, credential extraction) and reroutes to Claude. Novel prompts can still get through. For public-facing or sensitive channels, set model_override: "claude" in CHANNEL_CONFIG_JSON.

Conversation history is local

Lessons and conversation summaries write to ~/.agent-factory/ by default. Nothing leaves your machine unless you configure an external agent API.

History poisoning is blocked

Only entries the bot itself writes (source:"coordinator") are injected into prompts. Externally written entries are silently excluded — confirmed by adversarial test suite.

No credential defaults

All API keys are env vars. Phone numbers for alerts are optional with no fallback. Filesystem paths default to your home dir, not a hardcoded user path.

Get started

Clone + configure

Clone the repo, copy .env.example, add DISCORD_BOT_TOKEN and DATABASE_URL. Add BEAST_URL if you have a local inference server.

Set up Postgres

Run npm run migrate. Creates agent_tasks, board_sessions, board_outcomes, board_inbox tables plus pg-boss schema.

npm run dev

Starts coordinator + Discord bot. Startup validation prints which agents are reachable. Beast + Claude is all you need for full capability.

Ask anything

Type in Discord. Simple questions go to Beast (free). Complex ones escalate to Claude. Board sessions produce a verdict with Approve / Reject / Ask Claude buttons.

Clone and run

git clone https://github.com/rdcahalane/ai-skills.git
cd ai-skills/agent-factory
cp .env.example .env   # add DISCORD_BOT_TOKEN, DATABASE_URL
npm install
npm run migrate
npm run dev

View on GitHub Try the web UI first ⚡