Marigold: Privately Hosted AI Inference on AWS

The major AI providers bundle two things that should be separate: the model and the infrastructure. When you call the OpenAI or Anthropic API, your data travels to their servers, runs against their model, and returns a result. The provider sits between you and the model. Whatever their policy says about logging, retention, or training, the architecture makes them a party to every exchange.

Open-weight models remove that party. Llama, Mistral, Qwen, and others publish their weights publicly. The model can run anywhere – on our infrastructure, on yours, or both. The provider is no longer in the room.

Marigold hosts these models on private AWS infrastructure in London. We consider ourselves a communication provider: the post office cannot read your mail, and neither should we. The inference runs, the result returns, nothing is retained. We do not train models. We collect usage metadata – model selection, request volumes, error rates – to improve the service. We do not collect content.

When your compliance requirements or scale demand it, the same Marigold interface runs on your own hardware. The application does not change. The model does not change. We simply step further out of the room.

A drop-in replacement for the major AI APIs

Marigold exposes the same interface as OpenAI and Anthropic. Existing application code that calls those APIs can point at Marigold instead without changes. The underlying models are open-weight equivalents running on private AWS infrastructure in London – not on shared cloud servers, not routed through a third-party API, not subject to a provider’s data retention policy.

(For a detailed account of the architecture and the case for private inference in regulated environments, see Private Inference: Running AI Inside Your Own Infrastructure.)

Infrastructure

The infrastructure runs on AWS in London within a private network boundary. GPU capacity handles larger models and high-throughput workloads. No request leaves that boundary.

If you are looking to establish your own boundaries contact us to discuss deployment on your own infrastructure.

Workflows and pipelines

Single model calls handle straightforward tasks. More complex automation requires composing multiple steps: embed a document, classify its content, generate a structured summary, evaluate the output. Marigold supports this through a typed workflow layer – each step conditions on the outputs of the previous one, and the whole pipeline is declared rather than hand-coded.

This is the same pattern described in the runfox workflow engine, which integrates directly with Marigold as its execution substrate.

Getting access

Marigold is available at marigold.run. API keys, documentation, and access requests are handled there. For organisations evaluating private inference infrastructure for a specific use case, get in touch.