Building Your First Agent with Flue

A hands-on introduction to Flue, an experimental TypeScript framework for building server-side LLM agents. Covers the agent/harness/session model, three working examples from a simple translator to a container-backed coding agent, and an honest look at the trade-offs — so you can decide whether Flue fits your stack before you commit.

The mental model: agents, harnesses, sessions
Step 1: The smallest possible agent
Step 2: Give the agent a filesystem
Step 3: A real Linux box for a coding agent
Skills, tasks, and tools
Trade-offs: when Flue fits, and when it doesn’t
Where to go next

You’ve decided to put an LLM agent into production. Not a chatbot in a notebook — a real one: it runs server-side, fields requests from your app, reads and writes files, calls tools, and has to survive deploys and restarts. You reach for the obvious building blocks and immediately hit the awkward questions. Where does the agent’s filesystem live? How do you give it a shell without handing a model rm -rf on your production box? How do you keep one customer’s session separate from another’s? How do you run the same agent in CI today and on Cloudflare next week?

Flue is a TypeScript framework that answers those questions with one idea: a built-in agent harness. If you’ve used Claude Code, Codex, or OpenCode, you already know the shape of it — a model loop with a sandboxed shell, file tools, and skills. Flue takes that harness and makes it headless and programmable. No TUI, no human-in-the-loop assumption, just functions you call from your own code.

This post walks through building agents with Flue from a standing start, aimed at backend and TypeScript developers who are comfortable with Node but new to agent frameworks. We’ll cover the core model (agents, harnesses, sessions), build three increasingly capable agents, and end with an honest look at where Flue fits and where it doesn’t.

One caveat before we start: Flue is experimental as of June 2026. The README states plainly that APIs may change. The code below is adapted from the project’s official examples against the current main branch — treat it as a faithful map of the current shape, not a frozen contract, and check the docs before you ship.

The mental model: agents, harnesses, sessions

Three nouns do most of the work in Flue, and getting them straight early saves confusion later.

An agent is a source file — a definition of which model to use, which tools and skills are available, and what sandbox it runs in. You create one with createAgent(...).

A harness is what you get when you initialize an agent with init(agent). It’s the configured, running handle: model defaults, tools, the sandbox, the filesystem. Think of the agent as the blueprint and the harness as the building.

A session lives inside a harness and holds conversation history. You call harness.session() to open one, then session.prompt(...) to send a message. Reuse a session to continue a conversation; open a new one to start fresh.

For agents exposed over HTTP or WebSocket, there’s a fourth concept worth knowing: the instance <id> in the URL POST /agents/<name>/<id>. That <id> is the durable boundary — one customer, one repo, one conversation space. Reuse it to pick up where you left off; change it to start clean.

Step 1: The smallest possible agent

Here’s a complete agent. It takes text, translates it, and returns typed, schema-validated data.

// .flue/workflows/hello-world.ts
import { createAgent, type FlueContext, type WorkflowRouteHandler } from '@flue/runtime';
import * as v from 'valibot';

// Exporting this Hono middleware is what makes the workflow publicly reachable over HTTP.
export const route: WorkflowRouteHandler = async (_c, next) => next();

const translator = createAgent(() => ({ model: 'anthropic/claude-sonnet-4-6' }));

export async function run({ init, payload }: FlueContext) {
  const harness = await init(translator);
  const session = await harness.session();

  const { data } = await session.prompt(
    `Translate this to ${payload.language}: "${payload.text}"`,
    {
      result: v.object({
        translation: v.string(),
        confidence: v.picklist(['low', 'medium', 'high']),
      }),
    },
  );

  return data; // { translation: "...", confidence: "high" }
}

Two things are worth pausing on. First, the result schema: passing a valibot object to prompt() gets you back parsed, typed data instead of a raw string — which is what makes it practical to orchestrate based on an agent’s output rather than just display it. Second, notice what’s not here: no container, no Docker, no infrastructure setup.

That’s because unless you opt into a full container, Flue defaults to a virtual sandbox powered by just-bash — an in-process, simulated shell and filesystem. It’s dramatically faster and cheaper than spinning up a container per agent, which matters a lot when you’re running many agents at scale. The trade-off, which we’ll return to, is that it’s a simulated environment, not a real Linux box.

Run it from the CLI:

flue run hello-world --target node \
  --payload '{"text": "Hello world", "language": "French"}'

Or start the dev server (flue dev, default port 3583 — “FLUE” on a phone keypad) and curl it.

Step 2: Give the agent a filesystem

The virtual sandbox isn’t just for isolation — it’s a workspace the agent can read and search. Here’s a support agent that gets seeded with help articles, then uses its built-in grep, glob, and read tools to find relevant ones.

// .flue/workflows/support.ts
import { createAgent, type FlueContext, type WorkflowRouteHandler } from '@flue/runtime';

export const route: WorkflowRouteHandler = async (_c, next) => next();

const support = createAgent(() => ({ model: 'openrouter/moonshotai/kimi-k2.6' }));

export async function run({ init, payload }: FlueContext) {
  const harness = await init(support);
  const session = await harness.session();

  await session.fs.mkdir('/workspace/articles', { recursive: true });
  await session.fs.writeFile(
    '/workspace/articles/reset-password.md',
    '# Reset your password\n\nUse the account settings page to request a password reset email.',
  );

  return await session.prompt(
    `You are a support agent. Search the workspace for articles relevant
    to this request, then write a helpful response.\n\nCustomer: ${payload.message}`,
  );
}

The pattern (populate a filesystem, let the agent search it) is a clean alternative to stuffing everything into the prompt or wiring up a vector database. For a bounded corpus (a help center, a set of docs, a repo’s worth of code), giving the agent grep over real files is often simpler and more transparent than retrieval-augmented generation, because you can see exactly what the agent looked at. It is not a replacement for semantic search over millions of documents — grep matches strings, not meaning. Pick this when your corpus is small enough that keyword search finds what’s relevant.

Note the model string too: openrouter/moonshotai/kimi-k2.6. Models are namespaced provider/model, and the same agent code runs against Anthropic, OpenAI, OpenRouter, and others by changing that one string.

Step 3: A real Linux box for a coding agent

The virtual sandbox covers a lot, but a coding agent needs the real thing: git, npm, a browser, an actual repo on disk. Flue handles this with connectors — adapters for sandbox providers like Daytona or E2B.

Connectors are an unusual design choice worth flagging. They aren’t npm packages. They’re Markdown installation instructions hosted by the project, and you install one by piping it to your AI coding agent:

flue add daytona | claude    # or | codex, | opencode, | cursor-agent

Your coding agent reads the Markdown and writes a small TypeScript adapter (connectors/daytona.ts) into your project, which you then import. You can point flue add at any provider’s docs URL and get an adapter scaffolded in seconds. The catch: your dependency setup now runs through an LLM rather than a lockfile. If reproducible, audited builds matter to you, that’s worth thinking through.

// .flue/workflows/code.ts
import { createAgent, type FlueContext, type WorkflowRouteHandler } from '@flue/runtime';
import { Daytona } from '@daytona/sdk';
import { daytona } from '../connectors/daytona';

export const route: WorkflowRouteHandler = async (_c, next) => next();

export async function run({ init, payload, env }: FlueContext) {
  const client = new Daytona({ apiKey: env.DAYTONA_API_KEY });
  const sandbox = await client.create();

  const agent = createAgent(() => ({
    sandbox: daytona(sandbox),
    cwd: '/workspace/project',
    model: 'openai/gpt-5.5',
  }));
  const harness = await init(agent);
  const session = await harness.session();

  await session.shell(`git clone ${payload.repo} /workspace/project`);
  await session.shell('npm install', { cwd: '/workspace/project' });

  return await session.prompt(payload.prompt);
}

There’s a third sandbox option, local() (Node target only), which gives the agent direct access to the host filesystem and shell. The README is explicit about when that’s appropriate: CI runners, where gh, git, and npm are already on $PATH and the runner itself is your isolation boundary. By default local() inherits only a small allowlist of env vars; you opt extras in explicitly. Do not point local() at a machine you care about and then hand the agent untrusted input — that’s the failure mode to recognize.

Skills, tasks, and tools

Skills are Markdown files (.agents/skills/<name>/SKILL.md) discovered at runtime and activated by name: session.skill('triage'). This is the same skills concept as Claude Code — most of an agent’s logic lives in Markdown (skills, context, AGENTS.md) rather than in TypeScript. That keeps agent behavior editable by people who don’t write code, which matters a lot on mixed teams.

session.task(...) spins up a focused, one-shot child agent in a detached session, useful for parallel research or delegation. The child shares the sandbox but gets its own history.

Remote MCP servers connect via connectMcpServer(...). Connect in trusted code, pass github.tools to your agent, keep secrets in env. As of this version, Flue defaults to streamable HTTP, doesn’t auto-detect transports, and doesn’t handle OAuth callbacks, so plan to manage tokens yourself.

Trade-offs: when Flue fits, and when it doesn’t

Flue is experimental and has the trade-offs to match.

The virtual sandbox is a simulation, not Linux. just-bash is fast and cheap, but it implements a subset of a real shell. Anything that needs genuine system calls, real binaries, or a network namespace needs a container (Daytona, E2B) or local(). Don’t assume a script that runs in bash will run identically in the virtual sandbox. Test it.

The API is unstable. You’ll be an early adopter debugging things the docs haven’t covered yet. For a side project or an internal tool, that’s probably fine. For a system with uptime commitments, weigh that seriously.

The connector model is a genuine trade-off. LLM-scaffolded adapters are convenient, but they’re not the same as a versioned, audited dependency. If your org requires deterministic builds, you’ll want to review and pin whatever the agent writes.

On alternatives: if you want a thinner building block and don’t need the harness, the Vercel AI SDK or a provider SDK directly is less opinionated and less to learn. If you want a hosted platform rather than a framework you deploy yourself, managed offerings will move faster. And if your “agent” is really a single LLM call with a tool or two, this is overkill. Reach for it when you genuinely need the sandbox, sessions, and multi-target deployment.

Choose Flue if you’re building autonomous, server-side agents in TypeScript, want the Claude Code harness without the TUI, and can live with an evolving API. Skip it if you need a stable, long-term-supported API today, your use case is a simple chat completion, or your team can’t accept dependencies scaffolded by an LLM.

Where to go next

Start with the quickstart translator agent above on flue dev — it needs nothing but a model API key, and it’ll show you the prompt-and-typed-result loop in under five minutes. Once that clicks, try seeding a filesystem for the support agent, since that pattern generalizes to most real work. Save the container and local() sandboxes for when you actually hit the limits of the virtual one.

The sandbox choice drives everything else: virtual for fast, scalable, bounded agents; a container connector when you need real Linux; local() only where the environment is already your isolation boundary. Get that right and the rest of Flue’s model — agents, harnesses, sessions — falls into place.

“Frameworks emerge from the need to execute meaningfully, consistently, and fast.”-Rushi

Rushi's

Ctrl+AI+Ship