Results for "Apache Spark"

9 / 184 posts

Databricks, Apache Spark, Data Engineering and Science etc.

Azure Databricks is a platform on Microsoft Azure that helps with big data analysis and machine learning. It lets you work with large datasets easily and c…

Analytics Apache Spark azure
Mar 19, 2026 3 min read

What is Managed and External table in Spark

In Apache Spark, both Managed and External tables are used to store the data. However, there are significant differences in how Spark manages the data for …

azure data-engineering data-science
Mar 19, 2026 3 min read

Spark Transformations, Actions and Lazy Evaluation and DAG.

Apache Spark RDD supports two types of Operations: Transformations Actions A Transformation is a function that produces new RDD from the existing RDDs but …

Apache Spark azure cloud
Mar 19, 2026 4 min read

What is catalog in Spark

In Apache Spark, the catalog refers to the internal management system that keeps track of all the metadata related to tables, databases, functions, and oth…

azure database databricks
Mar 19, 2026 2 min read

What is Resilient Distributed Datasets (RDDs)

Resilient Distributed Datasets (RDDs) are a data structure of Apache Spark. They represent an immutable, distributed collection of objects that can be proc…

ai artificial-intelligence data-engineering
Mar 19, 2026 3 min read

Spark session vs Spark context

In Apache Spark, SparkSession and SparkContext are both essential components, but they serve different purposes and have different scopes. Here's a detaile…

data-science Pandas Python
Mar 19, 2026 3 min read

Application,Job,Stage,Task in Spark

In Apache Spark, the execution of a program is broken down into multiple levels of granularity: applications, jobs, stages, and tasks. Understanding these …

PySpark
Mar 19, 2026 3 min read

Applying Functions in PySpark

PySpark, the Python API for Apache Spark, provides multiple ways to apply functions to DataFrame columns. This flexibility allows data engineers and analys…

apply Apply function lower
Mar 19, 2026 2 min read

PySpark Built-in Functions

These functions are commonly used with groupBy() , agg() , or select() to compute things like sum, average, max, min, count, etc. PySpark functions come fr…

Aggregate apache spark for beginners big data tutorial
Mar 19, 2026 2 min read