Repeatable Analytics with Snowflake and dbt

Repeatable Analytics with Snowflake and dbt

Starting a new year usually comes with some immediate analytics challenges: reporting on year-end consumption, generating target lists, and forecasting metrics for the coming months. Quickly addressing these challenges enables organizations to kickstart their year. How quickly is your organization answering these questions? And are the answers accurate? A critical component of accomplishing this is repeatability. Building repeatable analytics solutions starts with building repeatable data pipelines. Doing so ultimately serves your stakeholders by supporting fast and accurate insights for their analysis.

Snowflake, the global data cloud platform, provides an enterprise-ready data platform service — and dbt (data build tool) complements Snowflake by managing the transformation pipelines that curate your data.

With dbt, your data pipelines are visualized in a data lineage graph and supported by tightly coupled documentation and testing. You’re still writing SQL (or Python if you want) to describe your transformations, but dbt manages the instantiation of your pipelines in Snowflake (or another supported data platform).

Managing your Snowflake Data Cloud (or other data platform) with dbt will enable you… But how?

  • Modularity: Pipelines are built with modular pieces so you can reuse rather than rewrite.
  • Portability: A project can be automatically deployed to separate environments and idempotent redeployments.
  • Understandability: For instance, rich metadata of where and how data is used across pipelines.

This results in repeatable data pipelines which support repeatable analytics. Or in other words, fast and accurate insights about your business.

Without this repeatability, you’re stuck in a cycle of ad hoc queries, disparate datasets, email chains, and long stares out the window questioning your sanity — not to mention the pain of trying to reproduce a report you created last year with “just a couple extra dimensions.” This process fails at scale because things always change, including your data and your stakeholders. A repeatable analytics process doesn’t just make it easier to recreate last year’s report; it also makes it easier to change it too. Snowflake and dbt make change easy, so you can keep up with your stakeholders asking for extra dimensions and integrate inevitable data changes.

image - dbt source - repeatable analytics with snowflake and dbt

The visual above, directly from dbt’s homepage, shows how dbt manages data pipelines in your warehouse. Snowflake provides many options for capturing raw data (i.e., snowpipe, external tables, data shares). You develop and document SELECT statements representing components in your pipeline (these are materialized by dbt, usually as views or tables in your warehouse). You assign tests to specific columns which can alert or prevent pipelines from running if they fail. You deploy the project and, importantly, orchestrate your dbt runs to keep data fresh. The final layer of your transformation pipelines are curated datasets, ready for consumption by end users: stakeholders using reports in Tableau, data scientists building machine learning models (not to be confused with dbt models), and analysts looking for key insights directly in Snowflake.

Snowflake and dbt benefits for end users

Those end users — the people searching for insights in data to help them make decisions — will benefit from this process because of:

  • Trust: Data is trusted to be accurate because of verified tests, rich documentation, consistent definitions, and clear lineage.
  • Access: Curated datasets are accessed in a consistent way and governed by role and row-based access control.
  • Changes: Transformation pipelines can be changed, deployed, and iterated on quickly with version control and CI/CD.

It’s clear to me (and hopefully to you) that developers and stakeholders both benefit from using Snowflake and dbt to build a repeatable analytics solution — starting with repeatable data pipelines. Atrium helps organizations create and grow repeatable business processes across Snowflake, Tableau, and Salesforce. I personally love working with these best-in-class tools and using them to help my clients build their data solutions — and I’m stoked to do more this year!

Learn more about how we can help extend, optimize, or get started on your repeatable analytics solutions in Snowflake.

You may also like...