PySpark

6 / 43 posts

PySpark

substring() in PySpark

📌 What is substring() ? The substring() function in PySpark is used to extract a portion of a string from a column in a DataFrame. It is part of the PySpa…

substr substring substring() vs substr()
Mar 19, 2026 2 min read
PySpark

concat() and concat_ws() in PySpark

In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column. ✅ concat() – Combines columns without any delim…

Combines columns concat concat_ws
Mar 19, 2026 2 min read
PySpark

PySpark Convert String to Array Column

To convert a string column (StringType) to an array column (ArrayType) in PySpark, you can use the split() function from the pyspark.sql.…

PySpark Convert String to Array Column SPLIT PySpark
Mar 19, 2026 1 min read
PySpark

Working with NULL/None Values in PySpark

🔍 What's fillna() or fill() in PySpark? In PySpark, both fillna() and fill() are used to replace null or missing values in a DataFrame. Both fillna() and …

dropna dropna() fill
Mar 19, 2026 1 min read
PySpark

PySpark Pivot and Unpivot DataFrame

✅ What is Pivot and Unpivot? Pivot = Convert rows into columns Unpivot = Convert columns into rows 🌀 Sample DataFrame Let’s start with a small DataFrame t…

pivot Unpivot PySpark
Mar 19, 2026 2 min read
PySpark

PySpark SQL Date and Timestamp Functions

🔧 Setup First (Optional for Reference) from pyspark.sql import functions as F from pyspark.sql import types as T data = df = spark.createDataFrame(data, )…

Date Datetime PySpark SQL Date and Timestamp Functions
Mar 19, 2026 2 min read