PySpark

9 / 43 posts

substring() in PySpark

📌 What is substring() ? The substring() function in PySpark is used to extract a portion of a string from a column in a DataFrame. It is part of the PySpa…

substr substring substring() vs substr()
Mar 19, 2026 2 min read

concat() and concat_ws() in PySpark

In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column. ✅ concat() – Combines columns without any delim…

Combines columns concat concat_ws
Mar 19, 2026 2 min read

PySpark Convert String to Array Column

To convert a string column (StringType) to an array column (ArrayType) in PySpark, you can use the split() function from the pyspark.sql.…

PySpark Convert String to Array Column SPLIT PySpark
Mar 19, 2026 1 min read

Working with NULL/None Values in PySpark

🔍 What's fillna() or fill() in PySpark? In PySpark, both fillna() and fill() are used to replace null or missing values in a DataFrame. Both fillna() and …

dropna dropna() fill
Mar 19, 2026 1 min read

PySpark Pivot and Unpivot DataFrame

✅ What is Pivot and Unpivot? Pivot = Convert rows into columns Unpivot = Convert columns into rows 🌀 Sample DataFrame Let’s start with a small DataFrame t…

pivot Unpivot PySpark
Mar 19, 2026 2 min read

PySpark SQL Date and Timestamp Functions

🔧 Setup First (Optional for Reference) from pyspark.sql import functions as F from pyspark.sql import types as T data = df = spark.createDataFrame(data, )…

Date Datetime PySpark SQL Date and Timestamp Functions
Mar 19, 2026 2 min read

PySpark Built-in Functions

These functions are commonly used with groupBy() , agg() , or select() to compute things like sum, average, max, min, count, etc. PySpark functions come fr…

Aggregate apache spark for beginners big data tutorial
Mar 19, 2026 2 min read

union(), unionAll(), and unionByName() in PySpark

Here's the corrected explanation of union() , unionAll() , and unionByName() in PySpark along with appropriate examples. 1. union() The union() method is u…

UNION unionAll unionByName
Mar 19, 2026 2 min read

orderBy() and sort() in PySpark

PySpark provides two functions, sort() and orderBy() , to arrange data in a structured manner. 1. Understanding sort() in PySpark from pyspark.sql.function…

OrderBy Sort PySpark
Mar 19, 2026 1 min read