The Workshop | ThinkFox R&D

01

Production

AI Agents

Claude Agent SDK

Multi-Vendor

Skills

Hooks

TaskFox Factory

TaskFox Factory is our AI workshop — a place where small specialized agents handle research, content, voice, and code on demand. Behind a single API, multi-vendor pipelines (Claude, Gemini, OpenAI, ElevenLabs) coordinate so each task gets the right tool without integration friction.

Built on the Claude Agent SDK with 60 skills, 16 rules, and 34 hooks wired through .claude/. New capabilities slot in fast — a skill we build today is available to every future agent session, on every machine, in every client engagement.

Our Process

01

Designed an extensible skill and agent library so new capabilities slot in fast (60 skills as of 2026-04-27)

02

Built multi-vendor pipelines — Claude, Gemini, OpenAI, ElevenLabs — behind one unified API

03

Wired 34 lifecycle hooks for everything from auto-formatting to verification gates to long-silence pings

04

16 rule files keep style, testing, git workflow, and skill-development conventions consistent across all agents

02

Production

Claude Code

iMessage

Hooks

Native Primitives

Always-On

CLAW

CLAW (Claude Long-running Always-on Workflow) is how we work with AI now. We steer a persistent Claude Code session from iMessage while it executes inside the real codebase — running multi-agent reviews, fixing bugs, opening PRs — and pings back when something needs attention.

Built entirely on Claude Code's native primitives so the lifecycle stays inside the tool we already trust. Hooks, MCP channels, scheduled routines, durable memory layers — no out-of-process daemons we have to operate ourselves.

Our Process

01

Native iMessage channel armed at session launch; reply tool routes through the plugin MCP

02

Foxwatch progress pings via in-session hook plus an optional launchd bridge; silence + cooldown thresholds prevent spam

03

Five compounding memory layers: instructions, session learnings, skill patterns, subagent patterns, retro scratch

04

Scheduled routines for proactive contact, weekly retrospective, nightly repo health, daily newsletter

03

Production

Claude SDK

Progressive Disclosure

Sandboxed Cloud

Skill Library

Agent Skills System

Built a library of self-contained agent skills that load on-demand instead of bloating every session. Each skill bundles its own instructions, scripts, and reference docs — agents pull them in when the prompt actually calls for that capability, not before.

Across 60 skills covering research orchestration, design system enforcement, dev-loop verification, and more, this is the layer that lets us spin up specialized capabilities in minutes, not days. Backed by multi-session orchestration so a single complex project can span multiple agent sessions without losing state. We've used this pattern to build complete applications — backend, frontend, deployment — entirely through coordinated agent sessions in sandboxed cloud environments.

Our Process

01

Skills load on-demand via progressive disclosure (instructions inline, references separate)

02

Multi-session orchestration with TaskList state and hook-driven verification gates

03

Sandboxed cloud execution via Daytona for safe parallel worker fan-out

04

Compounds across sessions — every skill we build adds capability to every future agent

04

Production

Agent Teams

Task Loop

RLM

Daytona

Parallel Sandboxes

Agent Harnesses

Three distinct harnesses for putting agents to work, each tuned to a different shape of problem. Same orchestration patterns ship into client engagements where the project's structure matches.

Agent Team handles parallel disjoint-file feature work with parent-owned scope fences. Task Loop sequences multi-feature backlogs with hook-driven verification gates. RLM (Recursive Language Model) decomposes tree-structured problems into sandboxed Daytona workers. Pick the harness that fits the shape of the work.

Our Process

01

Agent Team — parallel disjoint-file feature work in worktrees with parent-owned scope fences

02

Task Loop — sequenced multi-feature backlogs with hook-driven verification gates and TaskList state

03

RLM — tree-structured problem decomposition with sandboxed Daytona workers

04

Daytona integration — safe concurrent worker fan-out with full filesystem isolation per worker

05

Production

Evaluations

Harbor

Trigger Accuracy

DSPy

Continuous Improvement

Skill Evals (Harbor)

Skills only earn the context they take if they fire on the right prompts. Harbor runs deterministic offline-first eval suites against every skill's trigger description, scoring precision and recall before users see drift.

Paired with a DSPy optimization loop (BootstrapFewShot, MIPRO) that tunes the trigger descriptions themselves — data-driven iteration on the prompts that decide when a skill activates. Multi-run variance analysis surfaces unstable triggers before they ship.

Our Process

01

Deterministic Harbor suites: baselines, datasets, jobs, tasks, and templates checked into version control

02

Harbor evaluations specialist skill with Oracle+Claude default execution and offline-first reproducibility

03

DSPy optimization loop — BootstrapFewShot and MIPRO for trigger description tuning

04

Multi-run variance analysis surfaces unstable triggers before shipping

06

Production

Daily Pipeline

Multi-Model Curation

Trusted Sources

Buttondown

Action-to-Issue

TaskFox Newsletter

Every weekday a four-model curation pipeline produces TaskFox Dev News — a high-signal brief on what changed in AI yesterday. No press releases, no hype, every link earned. Designed for builders who need signal density.

Trusted-first sourcing pulls from frontier model updates, BlueSky developer voices, and curated engineering blogs. Same-day Action Queue items become GitHub follow-ups through the action-to-issue skill, so what's worth doing actually gets filed.

Our Process

01

Trusted-first sourcing — frontier model updates, BlueSky developer voices, curated engineering blogs

02

Visual HTML edition with TaskFox-branded layout for paid subscribers

03

Daily routine fires via the newsletter-daily scheduled task

04

Action-to-issue conversion turns newsletter Action Queue items into GitHub follow-ups same-day

07

Production

Alpine.js

Vanilla JS

SSE Streaming

Frontend

R&D

TaskFox Studio

Lightweight frontend for the TaskFox Factory API. Single-page app, real-time chat streaming, voice in/out, session management, browseable interfaces for skills, agents, and commands. Built to be fast to ship to and fast to use.

Zero build tooling. No npm client dependencies. Just modular ES6 and Alpine reactivity. Features include model selection, research mode toggle, image generation, and an events timeline. This is our in-house R&D playground — when we want to test a new TaskFox capability before it ships into client work, it lands here first.

Our Process

01

Real-time SSE streaming for chat and tool activity

02

Browse and invoke skills, agents, commands directly from the UI

03

OAuth flow with secure session management

04

Zero build pipeline — edit a file, refresh, done

AWS Bedrock AgentCore Runtime Project Image

08

Production

AWS Bedrock

AgentCore

Claude SDK

Serverless

Production Infrastructure

AWS Bedrock AgentCore Runtime

Built a Claude-backed Bedrock AgentCore runtime — our standard production infrastructure for shipping conversational AI agents on AWS. Memory across containers, identity management via Cognito, code execution, browser automation, all behind the Claude Agent SDK.

Comprehensive test coverage. Least-privilege IAM. Observability dashboards. The kind of infrastructure that means client deployments (like our Private Claude Workspace) inherit enterprise-grade auth, persistence, and tooling on day one.

Our Process

01

Claude Agent SDK integrated into AgentCore lifecycle with streaming responses

02

Bedrock Memory for conversation persistence across containers

03

Code execution + browser automation with session management

04

Cognito-backed auth + S3 + CloudFront, deployable in any AWS region

09

Production

CLI

Local-only

OTLP

Claude Code

Codex

ThinkFox Monitor

A local CLI dashboard that turns Claude Code and Codex session data into something you can actually look at. Reads from ~/.claude/ and ~/.codex/, generates self-contained HTML, opens in any browser. Stdlib-only Python — no external runtime dependencies.

Built because we run multiple agentic harnesses across both vendors and needed eyes on it. Stays local — no telemetry leaves the laptop unless you explicitly enable persistence. The default redaction policy strips prompt text, tool args, and account IDs before any snapshot exposure.

Our Process

01

Five tabbed views: searchable session list, three-column task kanban, team timeline, live Codex monitor, performance dashboard

02

Built-in OTLP receiver — POST /v1/metrics and POST /v1/logs for local telemetry ingestion

03

Hooks monitor with installable hook scripts for live Claude Code event capture

04

minimal_redacted_v1 policy by default — prompts, tool args, output bodies, account identifiers never collected