PySpark
9 / 43 posts
substring() in PySpark
📌 What is substring() ? The substring() function in PySpark is used to extract a portion of a string from a column in a DataFrame. It is part of the PySpa…
concat() and concat_ws() in PySpark
In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column. ✅ concat() – Combines columns without any delim…
PySpark Convert String to Array Column
To convert a string column (StringType) to an array column (ArrayType) in PySpark, you can use the split() function from the pyspark.sql.…
Working with NULL/None Values in PySpark
🔍 What's fillna() or fill() in PySpark? In PySpark, both fillna() and fill() are used to replace null or missing values in a DataFrame. Both fillna() and …
PySpark Pivot and Unpivot DataFrame
✅ What is Pivot and Unpivot? Pivot = Convert rows into columns Unpivot = Convert columns into rows 🌀 Sample DataFrame Let’s start with a small DataFrame t…
PySpark SQL Date and Timestamp Functions
🔧 Setup First (Optional for Reference) from pyspark.sql import functions as F from pyspark.sql import types as T data = df = spark.createDataFrame(data, )…
PySpark Built-in Functions
These functions are commonly used with groupBy() , agg() , or select() to compute things like sum, average, max, min, count, etc. PySpark functions come fr…
union(), unionAll(), and unionByName() in PySpark
Here's the corrected explanation of union() , unionAll() , and unionByName() in PySpark along with appropriate examples. 1. union() The union() method is u…
orderBy() and sort() in PySpark
PySpark provides two functions, sort() and orderBy() , to arrange data in a structured manner. 1. Understanding sort() in PySpark from pyspark.sql.function…