Why Private Inference?

Sending data to a commercial AI API is, structurally, the same decision as sending it to any third-party processor. The data leaves your infrastructure, travels to someone else’s servers, and something happens to it. What exactly happens depends on the provider’s policies – and policies change.

Private inference keeps the data inside your infrastructure. The model runs where the data lives. Nothing crosses that boundary.

The case for doing this falls into three distinct risk categories: Regulatory risk, Exposure risk, and Plagarism risk. Each behaves differently, and knowing which applies to you determines the conversation.

Regulatory risk - you know if you have this

Some organisations are not permitted to send certain categories of data to external processors, regardless of the provider’s assurances. NHS data frameworks, FCA conduct rules, GDPR Article 44 restrictions on international transfers, legal professional privilege: these are all operational constraints.

Regulatory risk is characterised as binary and externally imposed. Either the framework prohibits the data transfer or it does not. Commercial AI APIs are external processors. The analysis is usually straightforward, and the answer is usually no.

If you are in this category, you already know. The question is not whether private inference is necessary but how quickly it can be made compliant.

Exposure risk - you feel if you have this

Not all sensitive data is regulated. Competitive intelligence, acquisition strategy, unannounced product development, client relationships, internal disputes: none of this is necessarily covered by a regulatory framework, but the consequences of it leaving the organisation are severe.

The relevant question is not “are we allowed to send this?” but “what happens if this appears somewhere it should not?” A commercial API provider suffers a breach. A disgruntled employee exports query logs. A provider updates its data retention policy retroactively. These are not hypotheticals – each has happened to a major provider in the last three years. ref

Exposure risk is probabilistic rather than binary, which makes it easier to dismiss. The organisation that dismisses it usually discovers, too late, that the ICO’s definition of a reportable breach covers data sent to a third-party processor. The provider’s breach is your breach.

If you feel a degree of discomfort sending certain data through a commercial API, that discomfort is calibrated correctly. The architecture should match the instinct.

Plagiarism risk - you might not know this

This is the least understood of the three categories and the one most likely to affect organisations that have passed the previous two tests and believe they are fine.

Commercial AI providers train on data. They train on data submitted through their APIs. The opt-out mechanisms they offer vary in effectiveness and are subject to change. More importantly, the mechanism by which training data reappears in model outputs is documented, demonstrated in court, and not fully under the provider’s control.

In 2023 the New York Times demonstrated GPT-4 could reproduce verbatim articles under certain prompting conditions. The model had been trained on the content and could be induced to reproduce it.

The implication for any organisation that has submitted proprietary content through a commercial API is direct, regardless of copyright issues. A client’s legal strategy, submitted as context for a drafting task. A patient history, used to ground a clinical summarisation. A board paper, provided as background for an executive communication. Any of these could, under the right conditions, be reproduced in a response.

Could you defend that outcome to the client? To the ICO? To the board?

For Studio Ghibli the issue was wholly different. Hayao Miyazaki, the founder, had stated publicly and repeatedly that AI image generation was “an insult to human creativity”. His position was not ambiguous. Still, OpenAI released a Ghibli-style image filter that demonstrated, beyond reasonable doubt, that the model had been trained on Studio Ghibli’s work. The owner’s explicit objection had not prevented it.

The lesson applies to any organisation expressing a preference about use of its data, and assuming the preference would be honoured. Preferences are not contractual constraints, nor technical limitations.

Open-weight models separate this concern at the architectural level. The weights are trained publicly, on declared datasets, before you use them. When you run inference on open weights, you are consuming a fixed, published model – not contributing data to the next version of someone else’s. There is no feedback loop. What you send is processed and returned. It does not become training material.

This is the separation of concerns that matters. Model training is one activity, performed by specialists on declared data. Inference is another. Keeping them separate is not a preference, it is architecture.

(Marigold hosts open-weight models on private AWS infrastructure in London. We do not train models. The ICO’s guidance on processor liability is at ico.org.uk.)