Results for "collect"

7 / 184 posts

What is Data Ingestion and DataFrame API

Data ingestion : Data ingestion refers to the process of collecting, importing, and importing data from various sources into a system or storage environmen…

DAta ingestion Data Load etl
Mar 19, 2026 4 min read

What is cluster in Spark

what is cluster : In computing, a cluster refers to a collection of interconnected computers that work together as a single system . These computers, often…

Cluster SQL sql-server
Mar 19, 2026 3 min read

What is Resilient Distributed Datasets (RDDs)

Resilient Distributed Datasets (RDDs) are a data structure of Apache Spark. They represent an immutable, distributed collection of objects that can be proc…

ai artificial-intelligence data-engineering
Mar 19, 2026 3 min read

List,Tuple,Set and Dictionary in Python

🔹 List (सूची) 📌 Definition : List ek aisi collection hai jo ordered (क्रमबद्ध) aur mutable (बदलने योग्य) होती है. Ismein duplicate values allowed hain. m…

Python
Mar 19, 2026 2 min read

Data Pipeline and ETL (Extract, Transform, Load) Process/Tool and ELT

The ETL (Extract, Transform, Load) process/tool to collect, clean, and store data in a structured format. Extract (निकालना): सबसे पहले, डेटा को विभिन्न स्र…

DATA Pipeline ELT etl
Mar 19, 2026 7 min read

Collect() in PySpark

PySpark collect() Function – The collect() function in PySpark is used to retrieve all the rows of a DataFrame (or RDD) from the distributed cluster back t…

collect PySpark
Mar 19, 2026 2 min read

PySpark Built-in Functions

These functions are commonly used with groupBy() , agg() , or select() to compute things like sum, average, max, min, count, etc. PySpark functions come fr…

Aggregate apache spark for beginners big data tutorial
Mar 19, 2026 2 min read