PaySpark Data manipulation

Select Table into dataframe: df = spark.read.table(tableName="samples.tpch.customer").limit(5) df = spark.table(tableName="samples.tpch.customer").limit(5)…

Author

SQLDataDev Editorial Team

Mar 19, 2026 1 min read

Select Table into dataframe:

df = spark.read.table(tableName="samples.tpch.customer").limit(5)

df = spark.table(tableName="samples.tpch.customer").limit(5)

df = spark.sql('''Select * FROM samples.tpch.customer''').limit(5)

%sql
SELECT * FROM samples.tpch.customer limit 5

How to select Columns

df = df.selectExpr("*")
df = df.selectExpr("ColName1","ColName2")
df = df.select("ColName1","ColName2")
df.select(df&#091;'patientid'], df&#091;'2018_hospitalid']).show(1)
df.select(F.col("patientid"),F.col("2018_hospitalid")).show(1)

How to filter data: used where or filter, both are same

df = df.filter((df.c_custkey == 412446) & (df.c_nationkey == 20))

df = df.filter((df&#091;"speciesname"] == "Dog") & (df&#091;"hospitalid"] == 153))

df = df.filter((F.col("speciesname") == "Dog") & (F.col("hospitalid") == 153))

table2019 = df.where('''speciesname="Dog" and hospitalid=153''')

Topics covered

PySpark

Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.

Browse All Posts