TPC-H Data Warehouse Pipeline

TPC-H Data Warehouse Pipeline

← Home

TPC-H Data Warehouse Pipeline

Role
Data Engineer
Keywords
Airflow
dbt
Docker
uv
Python
astronomer
Snowflake
Year
2026
notion image

Table of Contents

About

page icon
Github
page icon
Tech Stack
  • Snowflake – cloud data warehouse used as the data source and storage layer
  • dbt – transformation framework used to build staging, intermediate, and fact models
  • Apache Airflow – workflow orchestrator used to schedule and execute the pipeline
  • Astronomer Cosmos – integration layer for running dbt projects inside Airflow DAGs
  • Python – used for Airflow configuration and pipeline orchestration
  • Docker – containerized environment for running Airflow and dependencies
  • uv – Python environment and dependency management

Introduction

This mini project implements a simple ELT data pipeline using Snowflake, dbt, and Apache Airflow. The pipeline processes data from Snowflake’s TPCH SF1 sample dataset and transforms it into an analytics-ready fact table.

Objectives

The main goals of this project were:
  • Practice building a modern ELT pipeline
  • Learn dbt modeling patterns (staging, marts)
  • Implement data quality tests using dbt
  • Orchestrate transformation workflows with Apache Airflow
The project focuses on pipeline structure and tooling rather than complex business logic.

Process

Other Projects

Gallery view
 

Let’s Work Together

 

Contact Mai

Name
Email*
Phone Number
Message*