Working with NULL/None Values in PySpark

🔍 What's fillna() or fill() in PySpark? In PySpark, both fillna() and fill() are used to replace null or missing values in a DataFrame. Both fillna() and …

Mar 19, 2026 1 min read

🔍 What's `fillna()` or `fill()` in PySpark?

In PySpark, both fillna() and fill() are used to replace null or missing values in a DataFrame.

Both fillna() and fill() work the same:

df.fillna(0) == df.na.fill(0)

# Sample Data
data = 

# Creating DataFrame
columns = 
df = spark.createDataFrame(data, columns)

# Show original DataFrame
print("Original DataFrame:")
df.show()

PySpark Drop Rows with NULL or None Values

PySpark drop() function can take 3 optional parameters that are used to remove Rows with NULL values on single, any, all, multiple DataFrame columns.

drop(how='any', thresh=None, subset=None)

All these parameters are optional.

how – This takes values ‘any’ or ‘all’. By using ‘any’, drop a row if it contains NULLs on any columns. By using ‘all’, drop a row only if all columns have NULL values. Default is ‘any’.
thresh – This takes int value, Drop rows that have less than thresh hold non-null values. Default is ‘None’.
subset – Use this to select the columns for NULL values. Default is ‘None.

Alternatively, you can also use DataFrame.dropna() function to drop rows with null values.

# Sample Data
data = 

# Creating DataFrame
columns = 
df = spark.createDataFrame(data, columns)

# Show original DataFrame
print("Original DataFrame:")
df.show()

🔍 What's fillna() or fill() in PySpark?

PySpark Drop Rows with NULL or None Values

Latest comments

🔍 What's `fillna()` or `fill()` in PySpark?