Back to all posts

concat() and concat_ws() in PySpark

In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column. ✅ concat() – Combines columns without any delim…

In PySpark, both concat() and concat_ws() are used to combine multiple columns into a single string column.


concat() – Combines columns without any delimiter

Syntax:

Java
from pyspark.sql.functions import concat, lit

concat(col1, col2, ...)

Example:

Java
from pyspark.sql.functions import concat, col, lit

data = 
df = spark.createDataFrame(data, )

df.show()

df_concat = df.withColumn("FullID", concat(col("FirstName"), lit("_"), col("LastName"), lit("_"), col("EmpID")))
df_concat.show()

Output:


concat_ws() – Combines columns with a delimiter

Syntax:

Java
from pyspark.sql.functions import concat_ws

concat_ws(delimiter, col1, col2, ...)

Example:

Java
from pyspark.sql.functions import concat_ws

df_concat_ws = df.withColumn("FullID", concat_ws("_", "FirstName", "LastName", "EmpID"))
df_concat_ws.show()

Output:

SQL
+---------+--------+-----+-------------+
|FirstName|LastName|EmpID|      FullID|
+---------+--------+-----+-------------+
|     John|     Doe|  101|  John_Doe_101|
|     Jane|   Smith|  102|Jane_Smith_102|
+---------+--------+-----+-------------+

🔍 Difference Between concat() and concat_ws()

Featureconcat()concat_ws()
DelimiterNo delimiter (must use lit())Built-in delimiter support
Null HandlingIf any column is null → result is nullSkips nulls
Use CaseMore control with lit()Simpler when you need separators

🔧 Real-Life Use Cases

1. Creating unique keys or IDs

Bash
# Combine BranchCode and EmpID with underscore
df.withColumn("UniqueKey", concat_ws("_", "BranchCode", "EmpID"))

2. Preparing data for export or logging

PHP
# Create CSV-style string columns
df.withColumn("ExportRow", concat_ws(",", "Name", "Department", "Salary"))

3. Audit Logs

CSS
# Create a string like: "Updated Salary for EmpID 123 to 90000"
df.withColumn("AuditMsg", concat_ws(" ", lit("Updated Salary for EmpID"), col("EmpID"), lit("to"), col("NewSalary")))

4. Address Fields

SQL
# Create full address from multiple columns
df.withColumn("FullAddress", concat_ws(", ", "Street", "City", "State", "Zip"))

Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.