Complex Data(StructType, ArrayType, and MapType) Types in PySpark

Great! Let’s break down PySpark's complex data types— StructType , ArrayType , and MapType —in a simple and clear way. We'll go over: What they are When to…

Author

SQLDataDev Editorial Team

Mar 19, 2026 4 min read

Great! Let’s break down PySpark's complex data types—StructType, ArrayType, and MapType—in a simple and clear way. We'll go over:

What they are
When to use or not use them
Simple code examples
A combined example showing all three in action

🔹 1. StructType

✅ What is it?

StructType lets you define nested columns (like a structure inside a structure). It’s useful when your data has subfields, like a person having a first, middle, and last name.

📌 When to use:

Use when your data is hierarchical or you want nested columns.
Avoid if the structure is very shallow or adds unnecessary complexity.

🧾 Example:

from pyspark.sql.types import StructType, StructField, StringType

schema = StructType()),
    StructField("age", StringType(), True)
])

data = 
df = spark.createDataFrame(data, schema=schema)
df.printSchema()
df.show(truncate=False)

🔹 2. ArrayType

✅ What is it?

ArrayType is used when you want a list of values in a column (e.g., a person knows multiple languages).

📌 When to use:

Use when a field has multiple values of the same type (like languages, hobbies).
Avoid if the number of values is always one or if a separate row per value is better for analysis.

🧾 Example:

from pyspark.sql.types import ArrayType, StringType

schema = StructType()

data = )]
df = spark.createDataFrame(data, schema=schema)
df.printSchema()
df.show(truncate=False)

🔹 3. MapType

✅ What is it?

MapType is like a Python dict—key-value pairs in a column.

📌 When to use:

Use when values are associated with keys, like {"hair": "black", "eye": "brown"}.
Avoid if keys are fixed and can just be separate columns.

🧾 Example:

from pyspark.sql.types import MapType, StringType

schema = StructType()

data = 
df = spark.createDataFrame(data, schema=schema)
df.printSchema()
df.show(truncate=False)

🔹 4. Combined Example: StructType + ArrayType + MapType

Let’s combine them all into one DataFrame:

from pyspark.sql.types import StructType, StructField, StringType, IntegerType, ArrayType, MapType

schema = StructType()),
    StructField("hobbies", ArrayType(StringType()), True),
    StructField("attributes", MapType(StringType(), StringType()), True),
    StructField("age", IntegerType(), True)
])

data = , {"hair": "black", "eye": "brown"}, 30),
    (("Maria", "Jones"), , {"hair": "blonde", "eye": "blue"}, 28)
]

df = spark.createDataFrame(data, schema=schema)
df.printSchema()
df.show(truncate=False)

🔹 Summary Table: When to Use

Data Type	Description	When to Use	Avoid When
StructType	Nested fields inside a column	When data has a sub-structure	If flat structure is enough
ArrayType	List of items	When you have multiple values	If only one value or can normalize by rows
MapType	Key-value pairs like a dict	When keys vary or are dynamic	If keys are fixed (use StructType instead)

🔹 1. Accessing `StructType` Fields

If you have a column that's a StructType, you can access its subfields using dot notation or with the col() function.

✅ Example:

from pyspark.sql.functions import col

df.select(
    col("person.firstname").alias("First Name"),
    col("person.lastname").alias("Last Name"),
    "age"
).show()

🔹 2. Accessing `ArrayType` Elements

You can access elements of an array by index or explode it into multiple rows.

✅ Example: Access by index

df.select(
    "person.firstname",
    col("hobbies").alias("First Hobby")
).show()

✅ Example: Explode array into rows

from pyspark.sql.functions import explode

df.select(
    "person.firstname",
    explode("hobbies").alias("Each Hobby")
).show()

🔹 3. Accessing `MapType` Values

You can access map values by key.

✅ Example:

df.select(
    "person.firstname",
     df.attributes.hair.alias("Hair Color"),
     col("attributes").alias("Eye Color")
).show()

🔸 Bonus: Flatten All Columns in One Go

Here’s how you might pull all useful fields into a flat structure:

Yes! There are a few more key things you should know when working with StructType, ArrayType, and MapType in PySpark, especially as a data analyst or engineer.

Here’s a breakdown of advanced but very useful concepts that help you master these complex data types:

🔸 1. Nesting: You can combine them together!

PySpark allows you to nest these complex types inside each other:

✅ Struct inside Struct:

StructType()),
    StructField("age", IntegerType())
])

✅ Struct inside Array:

ArrayType(StructType())

✅ Map inside Struct or vice versa:

StructType()

Create your own Schema

If you’re reading from a file or JSON column, define your custom schema using StructType:

schema = StructType())
])

Explode vs Inline

explode() is for arrays/maps: expands rows.
inline() is for StructType inside arrays.

from pyspark.sql.functions import inline

data2 = ),
    ("Anna",  )
]

schema2 = "name string, skills array<struct<language:string, proficiency:string>>"
df2 = spark.createDataFrame(data2, schema=schema2)

df2.show(truncate=False)

from pyspark.sql.types import StructType, StructField, StringType

schema = StructType()),
    StructField("age", StringType(), True)
])

data = 
df = spark.createDataFrame(data, schema=schema)
df.show(truncate=False)


from pyspark.sql.types import MapType, StringType, StructType, StructField

# Schema using MapType
schema = StructType()

# Data: each "name" is a map (i.e., dictionary)
data = 

df = spark.createDataFrame(data, schema=schema)
df.show(truncate=False)

Topics covered

Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.

Browse All Posts

🔹 1. StructType

✅ What is it?

📌 When to use:

🧾 Example:

🔹 2. ArrayType

✅ What is it?

📌 When to use:

🧾 Example:

🔹 3. MapType

✅ What is it?

📌 When to use:

🧾 Example:

🔹 4. Combined Example: StructType + ArrayType + MapType

🔹 Summary Table: When to Use

🔹 1. Accessing StructType Fields

✅ Example:

🔹 2. Accessing ArrayType Elements

✅ Example: Access by index

✅ Example: Explode array into rows

🔹 3. Accessing MapType Values

✅ Example:

🔸 Bonus: Flatten All Columns in One Go

🔸 1. Nesting: You can combine them together!

✅ Struct inside Struct:

✅ Struct inside Array:

✅ Map inside Struct or vice versa:

Create your own Schema

Explode vs Inline

🔹 1. Accessing `StructType` Fields

🔹 2. Accessing `ArrayType` Elements

🔹 3. Accessing `MapType` Values