Spark SQL provides a variety of commands for managing databases, tables, and performing SQL operations.
CREATE DATABASE IF NOT EXISTS demo;
SHOW DATABASES;
DESCRIBE DATABASE demo;
DESCRIBE DATABASE EXTENDED demo;
DESCRIBE TableName
DESCRIBE EXTENDED TableName
--which database select
SELECT current_database();
USE demo;
SHOW TABLES IN demo;
--CASCADE: This option automatically drops all associated objects (like tables, views, etc.) within the database.
DROP DATABASE IF EXISTS demo CASCADE;
In Spark SQL, views provide a way to create reusable queries. There are two primary types of views you can create: regular views (often referred to simply as "views") and temporary views.
A Spark application is an instance of a Spark job that runs on a cluster.
Temporary Views
- Session Scope: Temporary views are visible only within the session in which they are created. They are not accessible in other sessions.
- Lifecycle: They exist only for the duration of the Spark session. Once the session ends, the temporary view is dropped automatically.
- Creation: Created using the
CREATE OR REPLACE TEMP VIEWstatement. - Storage: No data is stored; only the definition (the SQL query) is saved.
- Use Case: Useful for intermediate computations and when you need a quick way to reuse query logic within a single session.
CREATE OR REPLACE TEMP VIEW temp_view_name AS
SELECT * FROM some_table WHERE condition;
Global Temporary Views
- Application Scope: Global temporary views are visible across all sessions within the same Spark application. They are not limited to a single session.
- Lifecycle: They persist until the Spark application terminates. When the application ends, the global temporary views are dropped.
- Creation: Created using the
CREATE OR REPLACE GLOBAL TEMP VIEWstatement. - Namespace: They are stored in the
global_tempdatabase. - Storage: Similar to temporary views, only the definition (the SQL query) is saved.
- Use Case: Useful when multiple sessions within the same application need to share the same view.
CREATE OR REPLACE GLOBAL TEMP VIEW global_temp_view_name AS
SELECT * FROM some_table
WHERE condition;
Regular Views
- Persistent: Regular views are stored in the catalog and persist independently of any session or application. They remain available until explicitly dropped.
- Creation: Created using the
CREATE VIEWstatement. - Storage: They store the query definition, but the data is dynamically queried from the underlying tables when the view is accessed.
- Use Case: Useful for creating permanent query logic that can be reused across multiple applications and sessions.
CREATE VIEW view_name AS
SELECT * FROM some_table
WHERE condition;
https://spark.apache.org/docs/latest/api/python/reference/index.html
Create table using CSV Source
Create Table IF NOT EXISTS databaseName.tableName
(
Id INT,
Name string
USING csv
OPTIONS (
path 'csv_file_path',
header 'true'
)
Create table using JSON
CREATE TABLE IF NOT EXISTS table_name (
Id INT,
Name STRING,
Address STRUCT<Street: STRING, City: STRING, Zip: INT>
)
USING json
OPTIONS (
path 'json_file_path'
)
CREATE TABLE demo.SampleTable
USING PARQUET
AS
SELECT * FROM SampleData