How to Build an AI Pipeline That Runs Anywhere: the runfox Backend Model

Building an AI pipeline in development and running it in production are usually two separate problems. Development wants fast iteration: run locally, inspect intermediate outputs, restart from a failed step without losing work. Production wants durability: distributed workers, persistent state, dead-letter queues, retry budgets. The common response is to write the pipeline twice – once as a script, once as a proper workflow definition – or to bring in a full platform (Airflow, Prefect, Dagster) from day one and absorb the infrastructure weight while still proving the idea.

runfox is a library, not a platform. It has no scheduler, no UI, no database you did not provision. The specific problem it addresses: a Backend owns all workflow state, and the backend is swappable. The same workflow definition runs in-memory for tests, against SQLite for single-process development, and against SQS and DynamoDB for production – with no changes to the workflow code.

The Core Model

Three concepts carry the library:

A Workflow is a stateless handle. It holds the workflow definition and a reference to a backend, but owns no state itself. Every method call loads current state from the backend, operates on it, and writes back. Two Workflow instances with the same execution ID and backend are interchangeable – which is what makes distributed execution work.

A Backend is composed of a Store and a Runner. The Store owns the workflow table: long-term state, one record per execution. The Runner owns the tasks table: short-term state, one row per dispatched step. The two can use different physical storage independently. SqliteStore and SqliteRunner share a .db file by convention; DynamoDBStore and SQSRunner use separate DynamoDB tables by design.

An executor is a plain callable with no runfox dependency: execute(op, inputs) -> dict. It receives the step’s dispatch token and a resolved input dict and returns an output dict. It can be tested without any runfox infrastructure.

Workflow Definitions

Workflows are defined in YAML. Steps declare their operation token, inputs, and dependencies. Inputs are literals or JSON Logic expressions (via json-logic-path) that reference workflow inputs, prior step outputs, or shared state:

name: example
steps:
  - op: make_greeting
    input:
      name: {"var": "input.name"}

  - op: shout
    depends_on: [make_greeting]
    input:
      text: {"var": "steps.make_greeting.output.text"}

outputs:
  message: {"var": "steps.shout.output.text"}

The op value is both the unique step identifier within the workflow and the dispatch token the executor receives. runfox resolves dependencies, determines which steps are ready, and dispatches them. The executor handles the work; runfox handles the sequencing.

Running the Same Workflow at Different Scales

The executor function is written once and does not change between backends:

def execute(op, inputs):
    if op == "make_greeting":
        return {"text": f"hello, {inputs['name']}"}
    if op == "shout":
        return {"text": inputs["text"].upper()}

In-process, for development and tests:

import runfox as rfx
from runfox.backend import InMemoryStore, InProcessRunner, InProcessWorker

runner  = InProcessRunner()
worker  = InProcessWorker(runner, execute)
backend = rfx.Backend(store=InMemoryStore(), runner=runner)
wf      = rfx.Workflow.from_yaml(SPEC, backend, inputs={"name": "world"})
result  = wf.run(worker=worker)
# result.outcome == {"message": "HELLO, WORLD"}

SQLite, for persistence across restarts without AWS:

from runfox.backend import SqliteStore, InProcessRunner, InProcessWorker

runner  = InProcessRunner()
worker  = InProcessWorker(runner, execute)
backend = rfx.Backend(store=SqliteStore("workflow.db"), runner=runner)
wf      = rfx.Workflow.from_yaml(SPEC, backend)
result  = wf.run(worker=worker)

Distributed, with DynamoDB and SQS:

from runfox.backend.aws import DynamoDBStore, SQSRunner

store  = DynamoDBStore(table="workflows")
runner = SQSRunner(
    queue_map={
        "instruct":        "https://sqs.eu-west-2.amazonaws.com/.../instruct",
        "image-embedding": "https://sqs.eu-west-2.amazonaws.com/.../image-embedding",
    },
    tasks_table="tasks",
)
backend = rfx.Backend(store=store, runner=runner)
wf      = rfx.Workflow.from_yaml(SPEC, backend, inputs={"name": "world"})

# distributed pattern uses advance() rather than run()
result = wf.advance()
if isinstance(result, rfx.Dispatch):
    backend.dispatch(wf.id, result.jobs)

Workflow.run() is not available in the distributed backend – it would require blocking the caller while waiting for remote workers. The event-driven pattern (advance(), dispatch, on_step_result(), advance() again) is the correct model for distributed execution. This is a deliberate restriction rather than a missing feature.

Branch Conditions and Loops

Branch conditions are JSON Logic expressions evaluated against step outputs after a step completes. Three actions are available: halt terminates the workflow immediately with a result payload; complete exits a step cleanly bypassing retry logic; set resets a named step and its dependents to ready, which is how data-driven loops work:

steps:
  - op: classify
    input:
      text: {"var": "input.text"}
    branch:
      - condition: {">=": [{"var": "steps.classify.output.score"}, 0.9]}
        action: complete
      - condition: {"<": [{"var": "steps.classify.output.score"}, 0.9]}
        action: halt
        result: {status: low_confidence}

A loop may reset itself by naming its own step in a set action:

- op: iterate
  branch:
    - condition: {"<": [{"var": "state.count"}, 10]}
      action: {set: "steps.iterate.status", value: ready}
    - condition: {">=": [{"var": "state.count"}, 10]}
      action: complete

Because branch conditions are JSON Logic data rather than Python code, the entire workflow definition – steps, dependencies, inputs, and branching logic – is plain YAML and JSON. It can be stored, versioned, loaded dynamically, and inspected without executing anything.

Result Types

wf.run() and wf.advance() return typed results:

Type	Meaning
`Complete(outcome)`	All steps finished; `outcome` contains resolved outputs
`Halt(result)`	A branch condition fired `halt`; `result` is the branch payload
`Dispatch(jobs)`	Steps claimed and dispatched; caller handles execution
`Pending()`	In-progress steps exist; nothing new is ready yet

State of the Library

runfox is at version 0.0.7 and in active development. It is used in production in our Marigold product for inference pipeline orchestration. The public API – Backend, Workflow, the result types, and the YAML workflow format – is stable enough to build on, but the version number is honest: breaking changes are possible before 1.0. The examples directory covers abstract patterns (accumulation, fan-out, fan-in, branching, retry) and worked use cases (document parsing, validation pipelines, a Fibonacci sequence, Conway’s Game of Life as a loop demonstration). These double as the regression test suite.

(The source is at github.com/bayinfosys/runfox and the package at pypi.org/project/runfox. If workflow orchestration for AI pipelines is a problem you are working on, get in touch.)

This article reflects runfox v0.0.7 as of April 2026. The backend architecture and YAML workflow format are stable. AWS backend table schemas and the distributed execution pattern may change before 1.0 – verify against the current README at github.com/bayinfosys/runfox before deploying.