Back to all posts

Spark SQL useful command

Spark SQL provides a variety of commands for managing databases, tables, and performing SQL operations. CREATE DATABASE IF NOT EXISTS demo; SHOW DATABASES;…

Spark SQL provides a variety of commands for managing databases, tables, and performing SQL operations.

SQL

CREATE DATABASE IF NOT EXISTS demo;
SHOW DATABASES;
DESCRIBE DATABASE demo;
DESCRIBE DATABASE EXTENDED demo;

DESCRIBE TableName
DESCRIBE EXTENDED TableName

--which database select
SELECT current_database();

USE demo;

SHOW TABLES IN demo;

--CASCADE: This option automatically drops all associated objects (like tables, views, etc.) within the database.

DROP DATABASE IF EXISTS demo CASCADE;

In Spark SQL, views provide a way to create reusable queries. There are two primary types of views you can create: regular views (often referred to simply as "views") and temporary views.

A Spark application is an instance of a Spark job that runs on a cluster.

Temporary Views

  1. Session Scope: Temporary views are visible only within the session in which they are created. They are not accessible in other sessions.
  2. Lifecycle: They exist only for the duration of the Spark session. Once the session ends, the temporary view is dropped automatically.
  3. Creation: Created using the CREATE OR REPLACE TEMP VIEW statement.
  4. Storage: No data is stored; only the definition (the SQL query) is saved.
  5. Use Case: Useful for intermediate computations and when you need a quick way to reuse query logic within a single session.
SQL
CREATE OR REPLACE TEMP VIEW temp_view_name AS
SELECT * FROM some_table WHERE condition;

Global Temporary Views

  1. Application Scope: Global temporary views are visible across all sessions within the same Spark application. They are not limited to a single session.
  2. Lifecycle: They persist until the Spark application terminates. When the application ends, the global temporary views are dropped.
  3. Creation: Created using the CREATE OR REPLACE GLOBAL TEMP VIEW statement.
  4. Namespace: They are stored in the global_temp database.
  5. Storage: Similar to temporary views, only the definition (the SQL query) is saved.
  6. Use Case: Useful when multiple sessions within the same application need to share the same view.
SQL
CREATE OR REPLACE GLOBAL TEMP VIEW global_temp_view_name AS
SELECT * FROM some_table 
WHERE condition;

Regular Views

  1. Persistent: Regular views are stored in the catalog and persist independently of any session or application. They remain available until explicitly dropped.
  2. Creation: Created using the CREATE VIEW statement.
  3. Storage: They store the query definition, but the data is dynamically queried from the underlying tables when the view is accessed.
  4. Use Case: Useful for creating permanent query logic that can be reused across multiple applications and sessions.
SQL
CREATE VIEW view_name AS
SELECT * FROM some_table 
WHERE condition;

https://spark.apache.org/docs/latest/api/python/reference/index.html

Create table using CSV Source

SQL
Create Table IF NOT EXISTS databaseName.tableName
(
   Id INT,
   Name string
USING csv
    OPTIONS (
        path 'csv_file_path',
        header 'true'
    )

Create table using JSON

SQL
CREATE TABLE IF NOT EXISTS table_name (
        Id INT,
        Name STRING,
        Address STRUCT<Street: STRING, City: STRING, Zip: INT>
    )
    USING json
    OPTIONS (
        path 'json_file_path'
    )
CSS
CREATE TABLE demo.SampleTable
USING PARQUET
AS
SELECT * FROM SampleData

Keep building your data skillset

Explore more SQL, Python, analytics, and engineering tutorials.