Bronze, Silver, Gold: A Layered Data Architecture in PostgreSQL
The medallion architecture has become the standard framing for data pipelines over the past few years, mostly through Databricks and the data lakehouse ecosystem. The names – bronze, silver, gold – are the durable part. The underlying principle predates the branding by decades: raw data lands in one place, validated and enriched data lives in another, and aggregated outputs consumed by dashboards and models live in a third. Each layer has one concern. Nothing that belongs in one layer bleeds into another.
The pattern is usually described in the context of distributed systems: Spark clusters, Delta Lake, object storage at scale. It maps onto PostgreSQL just as cleanly, and for teams that do not need distributed compute, PostgreSQL is a better substrate. The schema system enforces layer boundaries. Views make promotion logic explicit and auditable. The whole pipeline fits in one database, runs without a scheduler, and can be inspected with a SQL client.
The three layers
Bronze is the raw landing zone. Data arrives from sources – API responses, event
streams, file imports – and is written with minimal transformation. The schema is
permissive. Fields are stored as received, including nulls, inconsistencies, and values
that will later fail validation. Every record carries an ingested_at timestamp and,
where possible, the raw payload as a JSONB column alongside the parsed fields. The
bronze layer is append-only by convention: records are never updated or deleted, because
the raw record is the audit trail.
CREATE TABLE bronze.documents (
id BIGSERIAL PRIMARY KEY,
source TEXT NOT NULL,
environment TEXT NOT NULL DEFAULT 'live', -- 'live' or 'testing'
raw_payload JSONB NOT NULL,
content TEXT,
metadata JSONB,
ingested_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The environment column deserves comment. Any pipeline that ingests data from external
providers will run internal tests against those same providers: contract validation,
integration testing, load testing. Without a flag, test records land in bronze alongside
production data and contaminate every downstream layer. Adding environment at the
schema level – and enforcing it as the first filter in every promotion view – costs
nothing and prevents a category of downstream corruption that is surprisingly difficult
to clean up after the fact.
Bronze tables grow large. A pipeline that stores HTTP responses as JSONB will accumulate
payload data quickly; a third-party enrichment response that returns several kilobytes
per record adds up fast at any meaningful volume. This is expected and correct. The
bronze layer is not optimised for query performance; it is optimised for completeness and
recoverability. Partition by ingested_at once the table grows, and consider a retention
policy for the raw_payload column specifically if storage becomes a constraint, but do
not pre-optimise by discarding the raw record.
Silver is the enriched, validated layer. Records promoted from bronze have been checked against quality rules, normalised, and augmented with derived fields. For a document pipeline, silver is where embeddings live alongside the cleaned text. For a transactional pipeline, silver is where entity resolution has been applied, where referential integrity is enforced with foreign keys, and where the grain has been defined explicitly.
Promotion from bronze to silver happens through a view that encodes the validation logic, and a materialisation step that writes passing records to the silver table. Records that fail validation stay in bronze and are flagged for review, not silently dropped.
CREATE TABLE silver.documents (
id BIGINT PRIMARY KEY REFERENCES bronze.documents(id),
content TEXT NOT NULL,
content_length INT NOT NULL,
embedding VECTOR(384),
source TEXT NOT NULL,
promoted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
CONSTRAINT content_not_empty CHECK (char_length(content) > 0)
);
CREATE VIEW silver.documents_ready AS
SELECT
id,
content,
char_length(content) AS content_length,
embedding,
source
FROM bronze.documents
WHERE
environment = 'live'
AND content IS NOT NULL
AND char_length(content) > 0
AND embedding IS NOT NULL;
Gold is the consumption layer. It exists to serve specific use cases: dashboards, search interfaces, ML features, reporting aggregates. Gold tables are often pre-computed joins or aggregations over silver, shaped to the access pattern of whatever consumes them. A gold table for a document search interface might pre-join documents with their topic clusters, computed nightly. A gold table for a forecasting model might pre-aggregate transaction features at the customer grain, recomputed hourly.
Gold is the layer most likely to have multiple variants of the same underlying data, each shaped for a different consumer. That is correct behaviour, not duplication.
The promotion step as the enforcement point
The most important engineering decision in a medallion architecture is where validation lives. The temptation is to validate at ingestion – reject bad records before they reach bronze. This is wrong for two reasons. First, you lose the raw record, which is your only evidence of what the source system actually sent. Second, validation rules change as you understand the data better, and if you have discarded records that failed an early rule, you cannot re-promote them when the rule is revised.
Bronze accepts everything. Validation is a promotion gate, not an ingestion gate.
In PostgreSQL, the promotion step is a function that queries silver.documents_ready,
inserts passing records into silver.documents, and logs failures with the reason for
rejection.
CREATE TABLE silver.promotion_log (
bronze_id BIGINT REFERENCES bronze.documents(id),
attempted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
failure_reason TEXT NOT NULL
);
CREATE OR REPLACE FUNCTION silver.promote_documents()
RETURNS void AS $$
BEGIN
INSERT INTO silver.documents (id, content, content_length, embedding, source)
SELECT id, content, content_length, embedding, source
FROM silver.documents_ready
ON CONFLICT (id) DO NOTHING;
INSERT INTO silver.promotion_log (bronze_id, failure_reason)
SELECT
id,
CASE
WHEN content IS NULL THEN 'null content'
WHEN char_length(content) = 0 THEN 'empty content'
WHEN embedding IS NULL THEN 'missing embedding'
END
FROM bronze.documents
WHERE environment = 'live'
AND id NOT IN (SELECT id FROM silver.documents)
AND ingested_at > now() - INTERVAL '1 day'
AND (content IS NULL
OR char_length(content) = 0
OR embedding IS NULL);
END;
$$ LANGUAGE plpgsql;
This function can be called from the embedding listener after it writes an embedding
back to bronze, from pg_cron on a schedule, or from an external orchestrator.
Silver branches
The standard medallion diagram shows one silver layer. In practice, a pipeline serving multiple consumers usually develops multiple silver branches, each shaped for a different downstream purpose.
A pipeline ingesting third-party enrichment data illustrates this clearly. The bronze record is the raw HTTP response. From there, three things need to happen, and they have different owners and different concerns. The analytics team wants a clean, validated representation of the enriched entity – that is one silver table, filtered to live data, with the standard promotion logic. The QA team needs to validate the enrichment provider’s responses against the agreed contract: are the fields present, are the types correct, are the values within expected ranges – that is a second silver path, running over both live and testing records, producing a contract validation log rather than a clean entity record. The engineering team running integration tests wants to inspect testing records in isolation without them appearing in production reporting at all.
-- Silver branch for analytics: clean entities, live data only
CREATE TABLE silver.entities (
id BIGINT PRIMARY KEY REFERENCES bronze.documents(id),
...
);
-- Silver branch for QA: contract validation across all environments
CREATE TABLE silver.contract_checks (
bronze_id BIGINT REFERENCES bronze.documents(id),
environment TEXT NOT NULL,
field TEXT NOT NULL,
passed BOOLEAN NOT NULL,
checked_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Silver branch for engineering: testing records only
CREATE VIEW silver.test_responses AS
SELECT * FROM bronze.documents
WHERE environment = 'testing';
The branches share the same bronze source and the same promotion discipline: log failures, never discard raw records. What differs is the audience, the filters, and the output shape. Adding a branch is a schema change, not an architectural change.
The gold layer follows naturally from whichever silver branches are relevant to each
consumer. Analytics gold aggregates over silver.entities. A provider dashboard reads
from silver.contract_checks. Engineering tooling queries silver.test_responses
directly. Each consumer sees data shaped for its purpose, traced back to the same raw
bronze record.
What this architecture does not solve
The medallion pattern is not a substitute for schema design. A poorly designed silver schema with wrong grain or missing constraints will produce bad gold data regardless of how cleanly it is separated from bronze. The previous article in this series covers that ground: know what a data product engineer is responsible for before designing the silver layer, because silver is where that responsibility is discharged.
It is also not a solution to volume. PostgreSQL handles tens of millions of rows in a single table without special effort. For hundreds of millions, partitioning is the first tool. For billions, the distributed systems that originally popularised the medallion pattern become relevant. The architecture is the same; the substrate differs.
The pattern scales down as well as up. A pipeline that ingests one source, promotes a few hundred records an hour, and serves a single dashboard is still worth structuring in three layers. The discipline of separation pays dividends the first time a validation rule needs to change without losing historical data, which is usually sooner than expected.
(Bay Information Systems builds data pipelines for clients who need reliable data infrastructure without the overhead of a full MLOps platform. If you are working through the architecture decision start a conversation.)