Bronze, Silver, Gold: A Layered Data Architecture in PostgreSQL

The medallion architecture has become the standard framing for data pipelines over the past few years, mostly through Databricks and the data lakehouse ecosystem. The names – bronze, silver, gold – are the durable part. The underlying principle predates the branding by decades: raw data lands in one place, validated and enriched data lives in another, and aggregated outputs consumed by dashboards and models live in a third. Each layer has one concern. Nothing that belongs in one layer bleeds into another.

The pattern is usually described in the context of distributed systems: Spark clusters, Delta Lake, object storage at scale. It maps onto PostgreSQL just as cleanly, and for teams that do not need distributed compute, PostgreSQL is a better substrate. The schema system enforces layer boundaries. Views make promotion logic explicit and auditable. The whole pipeline fits in one database, runs without a scheduler, and can be inspected with a SQL client.


The three layers

Bronze is the raw landing zone. Data arrives from sources – API responses, event streams, file imports – and is written with minimal transformation. The schema is permissive. Fields are stored as received, including nulls, inconsistencies, and values that will later fail validation. Every record carries an ingested_at timestamp and, where possible, the raw payload as a JSONB column alongside the parsed fields. The bronze layer is append-only by convention: records are never updated or deleted, because the raw record is the audit trail.

CREATE TABLE bronze.documents (
    id           BIGSERIAL PRIMARY KEY,
    source       TEXT NOT NULL,
    environment  TEXT NOT NULL DEFAULT 'live',  -- 'live' or 'testing'
    raw_payload  JSONB NOT NULL,
    content      TEXT,
    metadata     JSONB,
    ingested_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

The environment column deserves comment. Any pipeline that ingests data from external providers will run internal tests against those same providers: contract validation, integration testing, load testing. Without a flag, test records land in bronze alongside production data and contaminate every downstream layer. Adding environment at the schema level – and enforcing it as the first filter in every promotion view – costs nothing and prevents a category of downstream corruption that is surprisingly difficult to clean up after the fact.

Bronze tables grow large. A pipeline that stores HTTP responses as JSONB will accumulate payload data quickly; a third-party enrichment response that returns several kilobytes per record adds up fast at any meaningful volume. This is expected and correct. The bronze layer is not optimised for query performance; it is optimised for completeness and recoverability. Partition by ingested_at once the table grows, and consider a retention policy for the raw_payload column specifically if storage becomes a constraint, but do not pre-optimise by discarding the raw record.

Silver is the enriched, validated layer. Records promoted from bronze have been checked against quality rules, normalised, and augmented with derived fields. For a document pipeline, silver is where embeddings live alongside the cleaned text. For a transactional pipeline, silver is where entity resolution has been applied, where referential integrity is enforced with foreign keys, and where the grain has been defined explicitly.

Promotion from bronze to silver happens through a view that encodes the validation logic, and a materialisation step that writes passing records to the silver table. Records that fail validation stay in bronze and are flagged for review, not silently dropped.

CREATE TABLE silver.documents (
    id              BIGINT PRIMARY KEY REFERENCES bronze.documents(id),
    content         TEXT NOT NULL,
    content_length  INT NOT NULL,
    embedding       VECTOR(384),
    source          TEXT NOT NULL,
    promoted_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    CONSTRAINT content_not_empty CHECK (char_length(content) > 0)
);

CREATE VIEW silver.documents_ready AS
SELECT
    id,
    content,
    char_length(content) AS content_length,
    embedding,
    source
FROM bronze.documents
WHERE
    environment = 'live'
    AND content IS NOT NULL
    AND char_length(content) > 0
    AND embedding IS NOT NULL;

Gold is the consumption layer. It exists to serve specific use cases: dashboards, search interfaces, ML features, reporting aggregates. Gold tables are often pre-computed joins or aggregations over silver, shaped to the access pattern of whatever consumes them. A gold table for a document search interface might pre-join documents with their topic clusters, computed nightly. A gold table for a forecasting model might pre-aggregate transaction features at the customer grain, recomputed hourly.

Gold is the layer most likely to have multiple variants of the same underlying data, each shaped for a different consumer. That is correct behaviour, not duplication.


The promotion step as the enforcement point

The most important engineering decision in a medallion architecture is where validation lives. The temptation is to validate at ingestion – reject bad records before they reach bronze. This is wrong for two reasons. First, you lose the raw record, which is your only evidence of what the source system actually sent. Second, validation rules change as you understand the data better, and if you have discarded records that failed an early rule, you cannot re-promote them when the rule is revised.

Bronze accepts everything. Validation is a promotion gate, not an ingestion gate.

In PostgreSQL, the promotion step is a function that queries silver.documents_ready, inserts passing records into silver.documents, and logs failures with the reason for rejection.

CREATE TABLE silver.promotion_log (
    bronze_id       BIGINT REFERENCES bronze.documents(id),
    attempted_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    failure_reason  TEXT NOT NULL
);

CREATE OR REPLACE FUNCTION silver.promote_documents()
RETURNS void AS $$
BEGIN
    INSERT INTO silver.documents (id, content, content_length, embedding, source)
    SELECT id, content, content_length, embedding, source
    FROM silver.documents_ready
    ON CONFLICT (id) DO NOTHING;

    INSERT INTO silver.promotion_log (bronze_id, failure_reason)
    SELECT
        id,
        CASE
            WHEN content IS NULL          THEN 'null content'
            WHEN char_length(content) = 0 THEN 'empty content'
            WHEN embedding IS NULL        THEN 'missing embedding'
        END
    FROM bronze.documents
    WHERE environment = 'live'
      AND id NOT IN (SELECT id FROM silver.documents)
      AND ingested_at > now() - INTERVAL '1 day'
      AND (content IS NULL
           OR char_length(content) = 0
           OR embedding IS NULL);
END;
$$ LANGUAGE plpgsql;

This function can be called from the embedding listener after it writes an embedding back to bronze, from pg_cron on a schedule, or from an external orchestrator.


Silver branches

The standard medallion diagram shows one silver layer. In practice, a pipeline serving multiple consumers usually develops multiple silver branches, each shaped for a different downstream purpose.

A pipeline ingesting third-party enrichment data illustrates this clearly. The bronze record is the raw HTTP response. From there, three things need to happen, and they have different owners and different concerns. The analytics team wants a clean, validated representation of the enriched entity – that is one silver table, filtered to live data, with the standard promotion logic. The QA team needs to validate the enrichment provider’s responses against the agreed contract: are the fields present, are the types correct, are the values within expected ranges – that is a second silver path, running over both live and testing records, producing a contract validation log rather than a clean entity record. The engineering team running integration tests wants to inspect testing records in isolation without them appearing in production reporting at all.

-- Silver branch for analytics: clean entities, live data only
CREATE TABLE silver.entities (
    id           BIGINT PRIMARY KEY REFERENCES bronze.documents(id),
    ...
);

-- Silver branch for QA: contract validation across all environments
CREATE TABLE silver.contract_checks (
    bronze_id    BIGINT REFERENCES bronze.documents(id),
    environment  TEXT NOT NULL,
    field        TEXT NOT NULL,
    passed       BOOLEAN NOT NULL,
    checked_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Silver branch for engineering: testing records only
CREATE VIEW silver.test_responses AS
SELECT * FROM bronze.documents
WHERE environment = 'testing';

The branches share the same bronze source and the same promotion discipline: log failures, never discard raw records. What differs is the audience, the filters, and the output shape. Adding a branch is a schema change, not an architectural change.

The gold layer follows naturally from whichever silver branches are relevant to each consumer. Analytics gold aggregates over silver.entities. A provider dashboard reads from silver.contract_checks. Engineering tooling queries silver.test_responses directly. Each consumer sees data shaped for its purpose, traced back to the same raw bronze record.


What this architecture does not solve

The medallion pattern is not a substitute for schema design. A poorly designed silver schema with wrong grain or missing constraints will produce bad gold data regardless of how cleanly it is separated from bronze. The previous article in this series covers that ground: know what a data product engineer is responsible for before designing the silver layer, because silver is where that responsibility is discharged.

It is also not a solution to volume. PostgreSQL handles tens of millions of rows in a single table without special effort. For hundreds of millions, partitioning is the first tool. For billions, the distributed systems that originally popularised the medallion pattern become relevant. The architecture is the same; the substrate differs.

The pattern scales down as well as up. A pipeline that ingests one source, promotes a few hundred records an hour, and serves a single dashboard is still worth structuring in three layers. The discipline of separation pays dividends the first time a validation rule needs to change without losing historical data, which is usually sooner than expected.

(Bay Information Systems builds data pipelines for clients who need reliable data infrastructure without the overhead of a full MLOps platform. If you are working through the architecture decision start a conversation.)


Questions about this? Get in touch.