← Home
TPC-H Data Warehouse Pipeline
Role
Data Engineer
Keywords
Airflow
dbt
Docker
uv
Python
astronomer
Snowflake
Year
2026

Table of Contents
← HomeTPC-H Data Warehouse PipelineTable of ContentsAboutIntroductionObjectivesProcessOther ProjectsLet’s Work Together
About
Github
Tech Stack
- Snowflake – cloud data warehouse used as the data source and storage layer
- dbt – transformation framework used to build staging, intermediate, and fact models
- Apache Airflow – workflow orchestrator used to schedule and execute the pipeline
- Astronomer Cosmos – integration layer for running dbt projects inside Airflow DAGs
- Python – used for Airflow configuration and pipeline orchestration
- Docker – containerized environment for running Airflow and dependencies
- uv – Python environment and dependency management
Introduction
This mini project implements a simple ELT data pipeline using Snowflake, dbt, and Apache Airflow. The pipeline processes data from Snowflake’s TPCH SF1 sample dataset and transforms it into an analytics-ready fact table.
Objectives
The main goals of this project were:
- Practice building a modern ELT pipeline
- Learn dbt modeling patterns (staging, marts)
- Implement data quality tests using dbt
- Orchestrate transformation workflows with Apache Airflow
The project focuses on pipeline structure and tooling rather than complex business logic.
Process
Other Projects
Gallery view

Employee Attrition Dashboard & Analytics
Dashboard that analyzes employee attrition root causes by leveraging a detailed analysis process, advanced Power BI features, and Machine Learning for Churn Prediction.
Data Analyst
Power BI
Python
People Analytics
Competition
Analytics Dashboard

SQL Databricks Data Warehouse
A Data Warehouse in Databricks using a Medallion architecture (bronze-silver-gold), accompanied by data tests in dbt, which performs automated data quality checks for prompt error handling.
Data Engineer
Databricks
SQL
dbt

Automated Spreadsheet Dashboards
Automate Google Sheets to create dashboards that track teams’ sales leads, Marketing campaigns, sales overview and more.
Data Analyst
Google Sheets
Analytics Dashboard

Customer Growth using AI (Segmentation, Churn Prediction, CLV Prediction)
Perform RFM customer segmentation and model churn, survival, and customer lifetime value (CLV) using transactional data to identify high-priority customer groups.
Data Scientist
Machine Learning
Statistical Modelling
Python
Streamlit
Flask

TPC-H Data Warehouse Pipeline
Modern ELT pipeline using Airflow, dbt, and Snowflake built on the TPC-H benchmark dataset (SF1).
Data Engineer
Airflow
dbt
Docker
uv
Python
astronomer
Snowflake

RAG Chatbot for Flower Shop Recommendations
A Naive RAG chatbot that recommends flower shop products using semantic search on embedded product data. It features a Streamlit interface, a Scrapy-based data scraper; a backend including MongoDB vector search for retrieval, an LLM via OpenRouter; and a Flask API that connects the frontend with the backend.
AI Engineer
RAG
MongoDB
Vector search
Flask
API
Python

Sequential Recommender System for Video Recommendations
Data Scientist
AI Engineer
Recommender Systems
Machine Learning
pytorch
API
FastAPI

