SaaS & Tech Finance, Fintech & Investment

dbt Testing Best Practices for Financial Data

The four categories of dbt tests that caught 15 silent bugs in a production financial BI system — schema tests, data tests, unit tests, and reconciliation tests — with real examples from each.

Arturo Cárdenas

Founder & Chief Data Analytics & AI Officer

April 1, 2026 · Updated March 23, 2026 · 10 min read

dbt Testing Best Practices for Financial Data

Key Takeaway

We found 15 bugs in a production financial system during a migration to dbt + Snowflake. Not because our code was bad — because the legacy system had never been tested against an independent reference. This post breaks down the four testing categories that would have caught every one of them before production, with real YAML and SQL examples from the engagement.

We found 15 silent bugs in a production financial BI system. Not edge cases. Not cosmetic issues. A 118% undocumented price increase. A fiscal quarter that spanned 15 months. Customer accounts that quietly disappeared from revenue reports. All in a system that had been running production financial reporting for years, maintained by a competent team. Here is what would have caught them before they went live.

The four dbt test categories arranged as a pyramid — schema tests at the base, reconciliation at the top

The four test categories that caught everything

Every bug we found falls into one of four testing categories. No single category catches everything. Together, they cover the full surface area of a financial dbt project.

Timeline showing when each test category catches bugs in the migration lifecycle

1. Schema tests: the guardrails you set once

Schema tests are YAML declarations that run against your models on every build. They are the cheapest tests to write and the easiest to forget. Most dbt projects start here and stop here — which is why they only catch the most obvious failures.

On our engagement, schema tests would have caught three of the fifteen bugs outright: the disappearing customer accounts, the duplicate grain rows, and the orphaned regional records.

Here is what the schema test for our ARR model looked like:

models:
  - name: arr_by_customer
    columns:
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id
      - name: arr_amount
        tests:
          - not_null
      - name: fiscal_quarter
        tests:
          - accepted_values:
              values: ['Q1', 'Q2', 'Q3', 'Q4']

That relationships test is the one people skip. It checks that every customer_id in your fact table actually exists in your dimension table. Without it, orphaned records silently inflate or deflate totals depending on how your downstream joins are written. Bug #3 — where Australian accounts vanished from quarterly ARR — would have surfaced as a referential integrity failure here instead of as a mysterious revenue variance three months later.

Schema tests validate structural assumptions, not business logic. They assert the shape of your data. On a financial dbt project, the minimum set is: not_null on every key and amount column, unique on your primary key, relationships on every foreign key, and accepted_values on any categorical column that feeds conditional logic. Write them before your first model reaches production. They take an hour. The bugs they prevent take days.

2. Data tests: custom SQL that encodes business rules

Data tests are standalone SQL files in your tests/ directory that return rows when something is wrong. Zero rows means passing. Any rows means failure. They are where you encode the business rules that schema tests cannot express.

The most valuable data test we wrote was a grain check — a query that asserts exactly one row per expected key combination:

-- tests/assert_one_row_per_customer_month.sql
select
    customer_id,
    fiscal_month,
    count(*) as row_count
from {{ ref('fct_revenue') }}
group by customer_id, fiscal_month
having count(*) > 1

Simple. Brutal. This test would have caught Bug #6 (duplicate grain rows inflating quarterly totals) and Bug #9 (a unit mismatch that created phantom rows at a grain level that should not have existed). Both bugs produced plausible-looking numbers. The quarterly totals were in the right ballpark. But the underlying grain was wrong, and everything downstream inherited the error.

We also wrote data tests for business-specific invariants: revenue should never be negative for these product lines, monthly totals should not swing more than 30% month-over-month without a known pricing event, and every fiscal quarter should contain exactly three months. That last one would have caught the 15-month fiscal quarter bug on the first CI run.

-- tests/assert_fiscal_quarter_has_three_months.sql
select
    fiscal_quarter,
    count(distinct fiscal_month) as month_count
from {{ ref('dim_fiscal_calendar') }}
group by fiscal_quarter
having count(distinct fiscal_month) != 3

3. Unit tests: isolating transformation logic

dbt introduced native unit tests in v1.8. They let you define fixed inputs, run a model, and assert against expected outputs — without touching the warehouse. For financial models with complex transformation logic, they are the only way to test edge cases that do not exist in your production data yet.

We used unit tests for our fiscal quarter conversion macro — the one that had a typo mapping Q2FY25 to 2023 instead of 2024. A unit test with boundary dates would have caught it immediately:

unit_tests:
  - name: test_fiscal_quarter_boundaries
    model: fct_revenue
    given:
      - input: ref('stg_billing')
        rows:
          - {billing_date: '2024-05-01', amount: 100}
          - {billing_date: '2024-04-30', amount: 200}
    expect:
      rows:
        - {fiscal_quarter: 'Q2FY25', amount: 100}
        - {fiscal_quarter: 'Q1FY25', amount: 200}

The test is six lines. The bug it catches cost days of debugging and produced incorrect quarterly ARR values across the entire dataset. Six lines of YAML versus multiple days of senior engineer time spent reconciling a phantom variance.

Unit tests also caught our Snowflake-native dbt macro issue (Bug #7) where calling a macro inside a PARTITION BY clause compiled successfully but produced wrong results. We would not have found this through data tests because the output looked reasonable — it just omitted three quarters of data silently. Only a unit test with known inputs and expected outputs would have surfaced the discrepancy.

Focus unit tests on transformation logic with conditional branches, date boundary handling, or platform-specific behavior. Don't unit-test simple passthrough models or straightforward aggregations — data tests handle those more efficiently. The sweet spot is models where the transformation could be correct for 99% of inputs and wrong for the 1% that hits an edge case. Fiscal calendar conversions, tiered pricing lookups, and regional multiplier logic are prime candidates. If a model has a CASE statement with more than three branches, it probably deserves a unit test.

4. Reconciliation tests: the nuclear option

Reconciliation tests compare your dbt output against an independent reference — a legacy system, an export, a manually validated spreadsheet. They are the most expensive tests to write and maintain, and they are the only tests that catch bugs where your code does exactly what you told it to but you told it the wrong thing.

Twelve of our fifteen bugs were found through reconciliation. Not schema tests. Not unit tests. Reconciliation. The reason is simple: schema tests validate structure, data tests validate rules you already know, unit tests validate logic you already wrote. Reconciliation validates against reality.

The core pattern is a FULL OUTER JOIN variance query:

select
    coalesce(v1.customer_id, v2.customer_id) as customer_id,
    coalesce(v1.fiscal_month, v2.fiscal_month) as fiscal_month,
    v1.revenue as legacy_revenue,
    v2.revenue as dbt_revenue,
    abs(v1.revenue - v2.revenue) as variance,
    case
        when v1.revenue is null then 'MISSING_IN_LEGACY'
        when v2.revenue is null then 'MISSING_IN_DBT'
        when abs(v1.revenue - v2.revenue) > 0.01 then 'VARIANCE'
        else 'MATCH'
    end as status
from legacy_output v1
full outer join dbt_output v2
    on v1.customer_id = v2.customer_id
    and v1.fiscal_month = v2.fiscal_month
where v1.revenue != v2.revenue
    or v1.revenue is null
    or v2.revenue is null

The FULL OUTER JOIN is critical. An INNER JOIN hides missing records — the most dangerous category of bug. If a customer exists in the legacy system but not in dbt, an INNER JOIN simply omits them from the comparison. You get a clean pass and a wrong answer.

We ran this at three grain levels: monthly totals, by product line, and by customer. A variance that disappears when you drill down is a grain issue. A variance that persists at every level is a logic error. The validation methodology explains the full hierarchical approach.

The hard part of reconciliation tests is not writing them. It is maintaining the reference data. Legacy system outputs change. Manual exports get stale. The most sustainable pattern we found was snapshotting the legacy output at a known-good point — a validated month-end close — and freezing it as a dbt seed or a static table. The seed does not change. The reconciliation query runs against it on every build. If your new model drifts from the snapshot, CI catches it. The trade-off is that you are validating against a frozen point in time rather than a live system, but for historical data that should be immutable, this is exactly the right behavior.

The test we wish we had written first

If we could go back and write one test before anything else, it would be the grain check on fct_revenue.

Not the reconciliation queries — those require a reference system. Not the unit tests — those require knowing which macros have edge cases. The grain check. Five lines of SQL.

The reason is compounding errors. When your fact table has duplicate rows at the wrong grain, every downstream model inherits the problem. Aggregations double-count. Joins fan out. Variance queries produce false positives that send you chasing phantom bugs instead of real ones. We spent almost a full day investigating a revenue variance that turned out to be a grain issue in a staging model, not a logic error in the mart layer. The grain check would have caught it in CI before it ever reached the mart layer.

The compounding is what makes it expensive. A bug in a staging model is a ten-minute fix. A bug in a staging model that has propagated through three downstream models and produced misleading reconciliation results is a day-long investigation. The grain check is a circuit breaker that prevents the propagation.

What we would add next time

Four tests we didn't write on this engagement but would include from day one on the next financial dbt project:

Seed-based reconciliation for pricing. Every hardcoded rate should live in a seed file with a change history. Test that the seed values match what is actually applied in the mart layer. Bug #1 — the invisible 118% price increase — existed because the rate lived in a database row with no version history.
Snapshot-based regression tests. After each validated release, snapshot the mart layer outputs. On the next build, compare against the snapshot. Any variance is either an intentional change (documented in the PR) or a regression. This catches the class of bugs where a model change in domain A silently affects domain B.
Source freshness tests with business-calendar awareness. Standard dbt source freshness checks whether data arrived. Business-calendar-aware freshness checks whether the right data arrived — was this month's close data actually loaded, or are we still running on last month?
Cross-domain join validation. When a model in the pricing domain joins to a model in the customer domain, test that the join does not drop or duplicate rows. Our domain isolation architecture prevented cross-domain breakage at the model level, but the dashboard layer joined across domains freely. A join validation test at the dashboard input layer would have caught the filter scope bug (Bug #6) before it reached any chart.

If your dbt project has models but no tests, we can set up the testing framework in days, not weeks. Book a quick assessment.

Topics

dbt testing best practicesdbt data testsdbt unit testsfinancial data testingdbt schema testsdbt reconciliation tests

Share this article:

Arturo Cárdenas

Founder & Chief Data Analytics & AI Officer

Arturo is a senior analytics and AI consultant helping mid-market companies cut through data chaos to unlock clarity, speed, and measurable ROI.

dbt Testing Best Practices for Financial Data

The four test categories that caught everything

1. Schema tests: the guardrails you set once

2. Data tests: custom SQL that encodes business rules

3. Unit tests: isolating transformation logic

4. Reconciliation tests: the nuclear option

The test we wish we had written first

What we would add next time

Topics

Arturo Cárdenas

Related Services

Unified Data Foundations

Financial Analytics & Investor Clarity

Related Case Studies

Revenue Analytics Rebuilt: $84M Validated, 5 Years of Pricing Debt Resolved

More Insights

The 60,000-Row Excel File That Changed Everything

1 Ticket to Change a Price. Not Anymore.

We Finished 5 Days Early. Here's the System.

Ready to turn data into decisions?