The Workshop
Where we prototype before we promise. Agents, harnesses, multi-vendor pipelines, observability — moving fast at the frontier so client work ships proven. What survives here ships in Client Work. What doesn't gets cut.
TaskFox Factory
TaskFox Factory is our AI workshop — a place where small specialized agents handle research, content, voice, and code on demand. Behind a single API, multi-vendor pipelines (Claude, Gemini, OpenAI, ElevenLabs) coordinate so each task gets the right tool without integration friction.
Built on the Claude Agent SDK with 60 skills, 16 rules, and 34 hooks wired through .claude/. New capabilities slot in fast — a skill we build today is available to every future agent session, on every machine, in every client engagement.
Our Process
Designed an extensible skill and agent library so new capabilities slot in fast (60 skills as of 2026-04-27)
Built multi-vendor pipelines — Claude, Gemini, OpenAI, ElevenLabs — behind one unified API
Wired 34 lifecycle hooks for everything from auto-formatting to verification gates to long-silence pings
16 rule files keep style, testing, git workflow, and skill-development conventions consistent across all agents
CLAW
CLAW (Claude Long-running Always-on Workflow) is how we work with AI now. We steer a persistent Claude Code session from iMessage while it executes inside the real codebase — running multi-agent reviews, fixing bugs, opening PRs — and pings back when something needs attention.
Built entirely on Claude Code's native primitives so the lifecycle stays inside the tool we already trust. Hooks, MCP channels, scheduled routines, durable memory layers — no out-of-process daemons we have to operate ourselves.
Our Process
Native iMessage channel armed at session launch; reply tool routes through the plugin MCP
Foxwatch progress pings via in-session hook plus an optional launchd bridge; silence + cooldown thresholds prevent spam
Five compounding memory layers: instructions, session learnings, skill patterns, subagent patterns, retro scratch
Scheduled routines for proactive contact, weekly retrospective, nightly repo health, daily newsletter
Agent Skills System
Built a library of self-contained agent skills that load on-demand instead of bloating every session. Each skill bundles its own instructions, scripts, and reference docs — agents pull them in when the prompt actually calls for that capability, not before.
Across 60 skills covering research orchestration, design system enforcement, dev-loop verification, and more, this is the layer that lets us spin up specialized capabilities in minutes, not days. Backed by multi-session orchestration so a single complex project can span multiple agent sessions without losing state. We've used this pattern to build complete applications — backend, frontend, deployment — entirely through coordinated agent sessions in sandboxed cloud environments.
Our Process
Skills load on-demand via progressive disclosure (instructions inline, references separate)
Multi-session orchestration with TaskList state and hook-driven verification gates
Sandboxed cloud execution via Daytona for safe parallel worker fan-out
Compounds across sessions — every skill we build adds capability to every future agent
Agent Harnesses
Three distinct harnesses for putting agents to work, each tuned to a different shape of problem. Same orchestration patterns ship into client engagements where the project's structure matches.
Agent Team handles parallel disjoint-file feature work with parent-owned scope fences. Task Loop sequences multi-feature backlogs with hook-driven verification gates. RLM (Recursive Language Model) decomposes tree-structured problems into sandboxed Daytona workers. Pick the harness that fits the shape of the work.
Our Process
Agent Team — parallel disjoint-file feature work in worktrees with parent-owned scope fences
Task Loop — sequenced multi-feature backlogs with hook-driven verification gates and TaskList state
RLM — tree-structured problem decomposition with sandboxed Daytona workers
Daytona integration — safe concurrent worker fan-out with full filesystem isolation per worker
Skill Evals (Harbor)
Skills only earn the context they take if they fire on the right prompts. Harbor runs deterministic offline-first eval suites against every skill's trigger description, scoring precision and recall before users see drift.
Paired with a DSPy optimization loop (BootstrapFewShot, MIPRO) that tunes the trigger descriptions themselves — data-driven iteration on the prompts that decide when a skill activates. Multi-run variance analysis surfaces unstable triggers before they ship.
Our Process
Deterministic Harbor suites: baselines, datasets, jobs, tasks, and templates checked into version control
Harbor evaluations specialist skill with Oracle+Claude default execution and offline-first reproducibility
DSPy optimization loop — BootstrapFewShot and MIPRO for trigger description tuning
Multi-run variance analysis surfaces unstable triggers before shipping
TaskFox Newsletter
Every weekday a four-model curation pipeline produces TaskFox Dev News — a high-signal brief on what changed in AI yesterday. No press releases, no hype, every link earned. Designed for builders who need signal density.
Trusted-first sourcing pulls from frontier model updates, BlueSky developer voices, and curated engineering blogs. Same-day Action Queue items become GitHub follow-ups through the action-to-issue skill, so what's worth doing actually gets filed.
Our Process
Trusted-first sourcing — frontier model updates, BlueSky developer voices, curated engineering blogs
Visual HTML edition with TaskFox-branded layout for paid subscribers
Daily routine fires via the newsletter-daily scheduled task
Action-to-issue conversion turns newsletter Action Queue items into GitHub follow-ups same-day
TaskFox Studio
Lightweight frontend for the TaskFox Factory API. Single-page app, real-time chat streaming, voice in/out, session management, browseable interfaces for skills, agents, and commands. Built to be fast to ship to and fast to use.
Zero build tooling. No npm client dependencies. Just modular ES6 and Alpine reactivity. Features include model selection, research mode toggle, image generation, and an events timeline. This is our in-house R&D playground — when we want to test a new TaskFox capability before it ships into client work, it lands here first.
Our Process
Real-time SSE streaming for chat and tool activity
Browse and invoke skills, agents, commands directly from the UI
OAuth flow with secure session management
Zero build pipeline — edit a file, refresh, done
AWS Bedrock AgentCore Runtime
Built a Claude-backed Bedrock AgentCore runtime — our standard production infrastructure for shipping conversational AI agents on AWS. Memory across containers, identity management via Cognito, code execution, browser automation, all behind the Claude Agent SDK.
Comprehensive test coverage. Least-privilege IAM. Observability dashboards. The kind of infrastructure that means client deployments (like our Private Claude Workspace) inherit enterprise-grade auth, persistence, and tooling on day one.
Our Process
Claude Agent SDK integrated into AgentCore lifecycle with streaming responses
Bedrock Memory for conversation persistence across containers
Code execution + browser automation with session management
Cognito-backed auth + S3 + CloudFront, deployable in any AWS region
ThinkFox Monitor
A local CLI dashboard that turns Claude Code and Codex session data into something you can actually look at. Reads from ~/.claude/ and ~/.codex/, generates self-contained HTML, opens in any browser. Stdlib-only Python — no external runtime dependencies.
Built because we run multiple agentic harnesses across both vendors and needed eyes on it. Stays local — no telemetry leaves the laptop unless you explicitly enable persistence. The default redaction policy strips prompt text, tool args, and account IDs before any snapshot exposure.
Our Process
Five tabbed views: searchable session list, three-column task kanban, team timeline, live Codex monitor, performance dashboard
Built-in OTLP receiver — POST /v1/metrics and POST /v1/logs for local telemetry ingestion
Hooks monitor with installable hook scripts for live Claude Code event capture
minimal_redacted_v1 policy by default — prompts, tool args, output bodies, account identifiers never collected
What proves itself here ships in Client Work →