Technical Articles
Structured knowledge from client work, explorations, and observation.
AI Systems (15)
Every approach to context generation -- from basic chunking to knowledge compilation -- is an instance of the same pattern. The quality of the build step determines what your agent can do at runtime.
A reference guide to evaluation datasets, metrics, and methodology organised by output type and professional domain.
Most organisations have a formal model of their customer relationships and a real one that differs from it. Embeddings and community detection surface the real structure. Here is how the pattern works and where it applies.
The format you use to pass data to a language model affects reliability and cost more than most practitioners expect. This is a taxonomy of the main options and the conditions under which each performs well.
Foundation models changed the cost of the mechanism. They did not change the first question: where are the labels? A labelled dataset is a measurement instrument, not training fuel.
Most production AI workloads are high-volume and fixed-task. Dynamic planning adds cost and reduces auditability without adding anything a static pipeline cannot do.
Context bleeds between agents in multi-agent systems. The XBOW experiment suggests that for exploratory tasks, this is a feature rather than a failure mode.
Vector search in production needs metadata filters, business rules, and combined ranking signals. How index structure determines what actually executes efficiently.
LLMs are trained in stages with different objectives. How the shift from statistical training to human preference ranking produces behaviours that metrics alone cannot explain.
Imbalanced datasets cause models to ignore the minority class. The sampling strategies that actually work for fraud detection, medical diagnosis, and quality control.
LLM outputs are probabilistic and context-dependent. A structured approach to evaluating language models in production across multiple dimensions.
One-hot, ordinal, target, embedding -- categorical encoding choices affect model performance significantly. A practical guide with model-specific recommendations.
Vector databases are harder than they look. A technical examination of similarity search, indexing trade-offs, and why naive implementations fail at scale.
RAG is not one technique. A breakdown of retrieval-augmented generation approaches by complexity, and how to match tooling to the technique you actually need.
Memory is what separates an AI agent from a stateless function. The types of memory available, how they work, and when each is appropriate.
Product & Strategy (13)
Eval datasets, scoring pipelines, and deployment gates for ML systems are not new concepts. Software engineers have been doing this for years under different names.
Vertically integrated AI companies collect data, train models, and sell inference. This is a business process error. The organisation that understands the task should own the data that defines it.
GEO tools claim to measure your visibility in AI-generated responses. Understanding how they actually collect data reveals why the problem is harder than SEO.
The Chicago school asks what AI allows you to stop paying for. The Austrian school asks what latent structure already exists in your accumulated data, waiting to be read.
Productivity growth has slowed across OECD economies. One explanation: the statistics measure the wrong competition. Capital and labour are not on the same curve.
Indirection -- accessing something through a reference rather than directly -- is a C programming concept. It also explains a surprising number of modern frustrations.
A model with 95% accuracy can destroy business value. How to choose evaluation metrics that reflect what the business actually needs.
Software engineers expect delivery to end at deployment. In ML, that is where the work begins. Four structural differences that change how ML projects are scoped, managed, and judged.
The traps that derail early AI products are predictable. Five failure modes from years of helping startups build and ship.
A practical guide for business and product owners who need to communicate clearly with AI engineers without speaking code.
LLMs introduced a new kind of problem into software: the model's interpretation of intent can diverge from the developer's. What alignment actually means in practice.
Choosing the right ML design pattern matters more than choosing the right model. Key patterns with applications to marketing and audience intelligence.
ML is a well-defined problem with specific costs. Using F(X)=Y to break down what you are actually paying for and who needs to do it.
Data Strategy (7)
Data engineering describes two distinct disciplines that require different skills and produce different failures when confused. Understanding the split is the first step to hiring and structuring a data team that actually works.
Siloed data is a list. Connected data encodes structure -- communities, gaps, relationships -- that accumulated through ordinary operation and has never been made visible.
Structured databases contain context that LLMs need to generate accurate queries. How to expose schema information in a way that preserves relationships and business logic.
Cross-validation is usually framed as a model evaluation technique. It is also the most reliable way to find which training examples are hurting your model.
Data maturity is a spectrum from ad hoc storage to strategic data infrastructure. What it is, how to assess it, and why it determines what AI is actually feasible.
Communication metadata is a structured sample of how an organisation actually works. What AI methods can surface from it.
A checklist for integrating machine learning into a product without overcommitting data infrastructure before you know what the problem is.
Architecture & Deployment (9)
From raw PyTorch to managed private API services. What each layer of the inference stack does, where the tools come from, and how they relate.
Multi-tenancy is not a single pattern. It is a spectrum from shared tables to fully separate infrastructure, and the right point on that spectrum depends on what varies between tenants, where your operational complexity budget sits, and what failure looks like.
The medallion architecture -- bronze for raw ingestion, silver for enriched and validated records, gold for aggregated outputs -- maps cleanly onto PostgreSQL. Each layer has a single concern. The transitions between them are where the interesting engineering lives.
PostgreSQL's LISTEN/NOTIFY mechanism lets you trigger embedding generation the moment a row is inserted, without polling, without a separate scheduler, and without coupling your embedding service to your database schema.
Data egress is the constraint that kills AI projects in regulated sectors. Open-weight models deployed inside a VPC remove the objection before it reaches legal review.
Facebook's progression from a single database to a sharded architecture contains practical lessons for any system that needs to grow. The constraints that drove each decision.
AWS, Borg, chaos engineering -- successful organisations build platforms rather than optimise pipelines. The distinction determines whether engineering effort compounds.
AI-assisted development accelerates a long-running separation between writing code and building systems. Why that distinction matters and what it means for teams.
LAMP, JAMstack, microservices, serverless -- what these terms actually mean and how to navigate architecture decisions without a technical co-founder.
Infrastructure (3)
Three risk categories that make private inference the right architectural choice -- regulatory constraint, exposure risk, and the largely unacknowledged risk of your data appearing in someone else's model output.
Open-source models on serverless infrastructure cost less than incumbent AI services. A practical guide to deploying on AWS Lambda.
The Linux primitives behind Docker containers -- namespaces, cgroups, and how they make containers faster than virtual machines.
Developer Tools (6)
A drop-in replacement for OpenAI and Anthropic endpoints, running open-weight models on private AWS infrastructure in London. Your data does not leave your network. We do not train on it.
Most workflow libraries couple the pipeline definition to the execution environment. runfox separates them: the same YAML definition runs in-process, against SQLite, or distributed across SQS and DynamoDB, with no changes to the workflow code.
Standard JSON Logic resolves single values via dot notation. When rules need to match against lists -- model outputs, tag arrays, multi-value fields -- dot notation is the wrong tool. json-logic-path adds a vars operator backed by JSONPath.
API Gateway configuration and Python route definitions live in separate files and drift apart. fastapi-aws generates both the AWS integration spec and the public API documentation from a single Python source.
DynamoDB rewards developers who define access patterns before writing schema. boto3 enforces none of that discipline. dynawrap moves the pattern onto the class and lets you swap the backend for PostgreSQL when AWS is not available.
Rate limits slow down early-stage development. APICache is an open-source library that caches API responses locally so you can iterate without hitting limits.
Security & Resilience (1)
Publicly deployed AI systems introduce specific vulnerabilities. A breakdown of the key risks and what to do about them.
Perspectives (10)
HTTP content negotiation was designed to decouple content from form. It stopped at format selection. Generative inference completes the original intention.
AI communication inherited natural language from human communication. That constraint is not a technical necessity.
The measure-optimise-adjust loop now running through AI first ran through advertising technology. The pathologies are the same. Adtech just got there a decade earlier.
Model weights are the fat layer -- general, beneath everything, available at marginal cost. Workflows are thin clients. Most AI infrastructure is built the wrong way round.
Single source of truth, idempotency, rollback, least privilege -- software engineers formalised governance problems that other institutions still handle by convention.
Generative AI removed the cost constraint that forced editorial discipline. When creation is free and unlimited, the signal collapses. Curation is the new scarcity.
The 21 million bitcoin limit is not in the whitepaper. It derives from a 32-bit integer ceiling, a type migration, and a patch written under pressure after a 184 billion BTC exploit.
Bitcoin and AI both convert electrical energy into discrete computational units via open protocols. The economic structure is closer than it first appears.
DevOps operationalised code. MLOps operationalised models. PMFOps is the emerging third layer -- treating the audience as a testable, versioned artefact.
Foundation models are not fast humans. Treating them as a non-deterministic data store opens more useful questions than the labour displacement framing does.