concat() and concat_ws() in PySpark

In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column. ✅ concat() – Combines columns without any delim…

Mar 19, 2026 2 min read

In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column.

✅ `concat()` – Combines columns without any delimiter

Syntax:

from pyspark.sql.functions import concat, lit

concat(col1, col2, ...)

Example:

from pyspark.sql.functions import concat, col, lit

data = 
df = spark.createDataFrame(data, )

df.show()

df_concat = df.withColumn("FullID", concat(col("FirstName"), lit("_"), col("LastName"), lit("_"), col("EmpID")))
df_concat.show()

Output:

✅ `concat_ws()` – Combines columns with a delimiter

Syntax:

from pyspark.sql.functions import concat_ws

concat_ws(delimiter, col1, col2, ...)

Example:

from pyspark.sql.functions import concat_ws

df_concat_ws = df.withColumn("FullID", concat_ws("_", "FirstName", "LastName", "EmpID"))
df_concat_ws.show()

Output:

+---------+--------+-----+-------------+
|FirstName|LastName|EmpID|      FullID|
+---------+--------+-----+-------------+
|     John|     Doe|  101|  John_Doe_101|
|     Jane|   Smith|  102|Jane_Smith_102|
+---------+--------+-----+-------------+

🔍 Difference Between `concat()` and `concat_ws()`

Feature	`concat()`	`concat_ws()`
Delimiter	No delimiter (must use `lit()`)	Built-in delimiter support
Null Handling	If any column is null → result is null	Skips nulls
Use Case	More control with `lit()`	Simpler when you need separators

🔧 Real-Life Use Cases

1. Creating unique keys or IDs

# Combine BranchCode and EmpID with underscore
df.withColumn("UniqueKey", concat_ws("_", "BranchCode", "EmpID"))

2. Preparing data for export or logging

# Create CSV-style string columns
df.withColumn("ExportRow", concat_ws(",", "Name", "Department", "Salary"))

3. Audit Logs

# Create a string like: "Updated Salary for EmpID 123 to 90000"
df.withColumn("AuditMsg", concat_ws(" ", lit("Updated Salary for EmpID"), col("EmpID"), lit("to"), col("NewSalary")))

4. Address Fields

# Create full address from multiple columns
df.withColumn("FullAddress", concat_ws(", ", "Street", "City", "State", "Zip"))

✅ concat() – Combines columns without any delimiter

✅ concat_ws() – Combines columns with a delimiter

🔍 Difference Between concat() and concat_ws()