Select Table into dataframe:
df = spark.read.table(tableName="samples.tpch.customer").limit(5)
df = spark.table(tableName="samples.tpch.customer").limit(5)
df = spark.sql('''Select * FROM samples.tpch.customer''').limit(5)
%sql
SELECT * FROM samples.tpch.customer limit 5
How to select Columns
df = df.selectExpr("*")
df = df.selectExpr("ColName1","ColName2")
df = df.select("ColName1","ColName2")
df.select(df['patientid'], df['2018_hospitalid']).show(1)
df.select(F.col("patientid"),F.col("2018_hospitalid")).show(1)
How to filter data: used where or filter, both are same
df = df.filter((df.c_custkey == 412446) & (df.c_nationkey == 20))
df = df.filter((df["speciesname"] == "Dog") & (df["hospitalid"] == 153))
df = df.filter((F.col("speciesname") == "Dog") & (F.col("hospitalid") == 153))
table2019 = df.where('''speciesname="Dog" and hospitalid=153''')