Reshape and model data using PySpark in Databricks

Natalia Mizerska
4 min readDec 15, 2021

Apache Spark is present in almost all modern Big Data architectures, being one of the most popular open-source framework. It has been proven to be extremely useful when we deal with gigabytes of data. In this article, I will present you with useful commands that will let you reshape, transform, manipulate and aggregate PySpark dataframes.

--

--

Natalia Mizerska

Software Engineer. Using Python, T-SQL and Unix/Shell as my weapons of choice. Cloud Nine since 1994. Cloud Azure since 2018.