In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column.
✅ concat() – Combines columns without any delimiter
Syntax:
from pyspark.sql.functions import concat, lit
concat(col1, col2, ...)
Example:
from pyspark.sql.functions import concat, col, lit
data =
df = spark.createDataFrame(data, )
df.show()
df_concat = df.withColumn("FullID", concat(col("FirstName"), lit("_"), col("LastName"), lit("_"), col("EmpID")))
df_concat.show()
Output:

✅ concat_ws() – Combines columns with a delimiter
Syntax:
from pyspark.sql.functions import concat_ws
concat_ws(delimiter, col1, col2, ...)
Example:
from pyspark.sql.functions import concat_ws
df_concat_ws = df.withColumn("FullID", concat_ws("_", "FirstName", "LastName", "EmpID"))
df_concat_ws.show()
Output:
+---------+--------+-----+-------------+
|FirstName|LastName|EmpID| FullID|
+---------+--------+-----+-------------+
| John| Doe| 101| John_Doe_101|
| Jane| Smith| 102|Jane_Smith_102|
+---------+--------+-----+-------------+
🔍 Difference Between concat() and concat_ws()
| Feature | concat() | concat_ws() |
|---|---|---|
| Delimiter | No delimiter (must use lit()) | Built-in delimiter support |
| Null Handling | If any column is null → result is null | Skips nulls |
| Use Case | More control with lit() | Simpler when you need separators |
🔧 Real-Life Use Cases
1. Creating unique keys or IDs
# Combine BranchCode and EmpID with underscore
df.withColumn("UniqueKey", concat_ws("_", "BranchCode", "EmpID"))
2. Preparing data for export or logging
# Create CSV-style string columns
df.withColumn("ExportRow", concat_ws(",", "Name", "Department", "Salary"))
3. Audit Logs
# Create a string like: "Updated Salary for EmpID 123 to 90000"
df.withColumn("AuditMsg", concat_ws(" ", lit("Updated Salary for EmpID"), col("EmpID"), lit("to"), col("NewSalary")))
4. Address Fields
# Create full address from multiple columns
df.withColumn("FullAddress", concat_ws(", ", "Street", "City", "State", "Zip"))