Projects

Open Source

Small, focused Python libraries born from real data engineering work — extracted, polished, and shared so other teams can avoid solving the same problem twice.

pytest-pyspark-utils

Reusable Spark fixture and Delta table caching for PySpark tests

Active

A pytest plugin that provides a session-scoped spark fixture and automated Delta table caching — eliminating boilerplate setup while keeping each test fully isolated, fast, and reproducible across runs.

$ pip install pytest-pyspark-utils
Python Pytest PySpark Testing

spalah

PySpark helpers for everyday DataFrame surgery

Maintained

A small collection of PySpark utilities for things you end up writing over and over — schema diffs, column renaming, nested-field cleanup, and everyday DataFrame transformations ready to drop into any pipeline.

$ pip install spalah
Python PySpark Databricks DataFrames
Contribute

Found a bug or want a feature?

Issues, ideas, and pull requests are always welcome on GitHub.

github.com/avolok →