PySpark
6 / 43 posts
substring() in PySpark
📌 What is substring() ? The substring() function in PySpark is used to extract a portion of a string from a column in a DataFrame. It is part of the PySpa…
concat() and concat_ws() in PySpark
In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column. ✅ concat() – Combines columns without any delim…
PySpark Convert String to Array Column
To convert a string column (StringType) to an array column (ArrayType) in PySpark, you can use the split() function from the pyspark.sql.…
Working with NULL/None Values in PySpark
🔍 What's fillna() or fill() in PySpark? In PySpark, both fillna() and fill() are used to replace null or missing values in a DataFrame. Both fillna() and …
PySpark Pivot and Unpivot DataFrame
✅ What is Pivot and Unpivot? Pivot = Convert rows into columns Unpivot = Convert columns into rows 🌀 Sample DataFrame Let’s start with a small DataFrame t…
PySpark SQL Date and Timestamp Functions
🔧 Setup First (Optional for Reference) from pyspark.sql import functions as F from pyspark.sql import types as T data = df = spark.createDataFrame(data, )…