🔍 What's fillna() or fill() in PySpark?
In PySpark, both fillna() and fill() are used to replace null or missing values in a DataFrame.
Both fillna() and fill() work the same:
df.fillna(0) == df.na.fill(0)
# Sample Data
data =
# Creating DataFrame
columns =
df = spark.createDataFrame(data, columns)
# Show original DataFrame
print("Original DataFrame:")
df.show()



PySpark Drop Rows with NULL or None Values
PySpark drop() function can take 3 optional parameters that are used to remove Rows with NULL values on single, any, all, multiple DataFrame columns.
drop(how='any', thresh=None, subset=None)
All these parameters are optional.
- how – This takes values ‘any’ or ‘all’. By using ‘any’, drop a row if it contains NULLs on any columns. By using ‘all’, drop a row only if all columns have NULL values. Default is ‘any’.
- thresh – This takes int value, Drop rows that have less than thresh hold non-null values. Default is ‘None’.
- subset – Use this to select the columns for NULL values. Default is ‘None.
Alternatively, you can also use DataFrame.dropna() function to drop rows with null values.
# Sample Data
data =
# Creating DataFrame
columns =
df = spark.createDataFrame(data, columns)
# Show original DataFrame
print("Original DataFrame:")
df.show()


