Generative Engine Optimisation: A Measurement Problem

A user asks ChatGPT which consultancies specialise in ML deployment for regulated sectors. My name either appears in the response or it does not. There is no rank to track, no position 7.7 to improve, no impressions column. Generative engine optimisation (GEO) addresses this: the question of whether your organisation appears in AI-generated responses, and what to do about it. The tools claiming to measure it are doing something fundamentally different from SEO tooling, at considerably higher cost, with considerably noisier results – and most organisations treating GEO as a direct extension of search have not yet understood why.

SEO measures a deterministic system cheaply

Traditional SEO tooling collects data in three main ways: web crawlers that build their own index, clickstream data purchased from browser extensions and ISPs, and access to search engine data directly. The thing being measured – search rankings – is largely deterministic. Query a keyword, get a ranked list, record position 1 through N. Running that measurement repeatedly produces stable results. The cost per data point is low, and the signal does not decay between measurements.

GEO measures a probabilistic system at API rates

GEO tools have no index to crawl and no ranked list to record. Instead, they send large batches of queries to AI systems (ChatGPT, Perplexity, Gemini, Claude) via API or scraped interface, parse the responses for brand mentions and cited URLs, and aggregate those into a share-of-voice metric. Your score is, roughly, how often your brand or domain appears across a sample of queries relevant to your sector.

The cost structure follows directly from probabilistic outputs. The same query submitted twice may produce different responses – sometimes materially different. A single query run tells you almost nothing reliable. To build a stable distribution, GEO tools run each query many times, across multiple models. Characterising your visibility for a single sector across a handful of AI systems may require thousands of API calls, run continuously, to track how that visibility shifts over time. This is not a minor overhead compared to recording rank 4 for a keyword. It is a structural difference in the cost of measurement.

Citation-based systems are the tractable case

Not all AI systems create the same measurement problem. Perplexity and ChatGPT with browsing enabled cite sources explicitly, giving a URL to record alongside each mention. Measuring visibility in those systems means counting how often your domain appears in citations across the query sample – noisy, but tractable.

Non-citing models require brand-name parsing on free text. Whether your organisation’s name appears in a response is detectable; whether the response was informed by your content is not. The attribution signal collapses, and you are left measuring the surface of the output rather than its provenance.

The query library determines what you are actually measuring

The query sample is doing work that is easy to underestimate. GEO tools maintain curated sets of queries per industry, attempting to approximate what real users ask. A high score on one provider’s query library and a low score on another is not a contradiction – it reflects different assumptions about what questions get asked, not a genuine difference in visibility. Share-of-voice is only meaningful relative to the query set it was measured against, and that query set is an educated guess, not a validated sample.

Defining what to optimise precedes optimising it

Improving GEO visibility involves two phases that most advice conflates. The first is characterising where you currently sit: which queries surface your brand, in which contexts, alongside which competitors. This is an experimental design problem. You are deciding what questions are worth asking of a distribution you cannot directly inspect. The query library is a sampling strategy; designing a better one is the harder, more consequential task.

The second phase, once you have defined what you are measuring, is optimisation. Find the content types and distribution channels that move your score on the metrics you have chosen. These phases require different thinking, and running the second without completing the first produces local optimisation against the wrong objective.

Specific, falsifiable content gets cited; hedged generalities do not

Some patterns in what AI systems cite are consistent enough to act on. Content that makes concrete, verifiable claims is more citable than content that hedges around general observations. A sentence with a specific claim – “self-hosted inference typically reaches cost parity with cloud APIs above a few million tokens per day” – gives a retrieval system something to attribute. A sentence describing general trends in ambiguous terms does not.

Being cited by sources that AI systems draw on heavily matters more than publishing volume. A single mention in a widely-indexed technical publication carries more weight than ten articles on a low-traffic domain, because the training data and retrieval corpora underlying these systems are not evenly distributed.

Structured content with clear headings, explicit claims, and consistent terminology is easier to parse and attribute. This is the same discipline that aids readability and search indexing, but the mechanism differs: retrieval systems match against chunks, not pages.

Summary

GEO matters to any organisation whose prospective clients use AI systems for research and discovery. The measurement problem is harder than SEO by construction: the underlying system is probabilistic, the cost of building reliable distributions is high, and the query libraries used by commercial tools are unvalidated assumptions rather than observed behaviour.

The practical starting point is not a GEO tool subscription but a structured probe set: a curated collection of queries your prospective clients plausibly submit to AI systems, run manually across the main platforms, with results recorded. This is slower than an automated tool but produces the same output: a characterisation of where you currently stand. The optimisation follows from that characterisation, not from generic tactics applied before it.

(Bay Information Systems helps organisations understand and improve their visibility in AI-generated responses. If the measurement question is one you are working on, that is a reasonable starting point for a conversation.)