Tutorials
Courses
Data Structure
Java
Python
HTML
Interview Preparation
DSA
Practice Problems
C
C++
Java
Python
JavaScript
Data Science
Machine Learning
Courses
Linux
DevOps
SQL
Web Development
System Design
Aptitude
GfG Premium
Similar Topics
Web Technologies
37.4K+ articles
DSA
21.3K+ articles
Python
21.3K+ articles
Experiences
16.6K+ articles
Interview Experiences
14.2K+ articles
Java
10.0K+ articles
Scala
1.8K+ articles
Scala
1.2K+ articles
Python-Pyspark
179+ articles
Hadoop
41+ articles
Apache-spark
11 posts
Recent Articles
Popular Articles
Install Apache Spark in a Standalone Mode on Windows
Last Updated: 10 April 2023
Apache Spark is a lightning-fast unified analytics engine used for cluster computing for large data sets like BigData and Hadoop with the aim to run programs parallel acro...
read more
Java
Scala
Hadoop
DSA
Scala
Apache-spark
How to Create Multiple Lags in PySpark
Last Updated: 28 April 2025
In this article, we are going to learn how to create multiple lags using pyspark in Python.What is lag in Pyspark? The lag lets our query on more than one row of a table a...
read more
Technical Scripter
Python
Python Programs
Picked
Technical Scripter 2022
Python Framework
Apache-spark
How to add a column to a nested struct in a pyspark
Last Updated: 28 April 2025
In this article, we are going to learn how to add a column to a nested struct using Pyspark in Python.Have you ever worked in a Pyspark data frame? If yes, then might sure...
read more
Technical Scripter
Python
Python Programs
Picked
Technical Scripter 2022
Python Framework
Apache-spark
Drop a column with same name using column index in PySpark
Last Updated: 28 April 2025
In this article, we are going to learn how to drop a column with the same name using column index using Pyspark in Python.Pyspark offers you the essential function 'drop' ...
read more
Technical Scripter
Python
Python Programs
Picked
Technical Scripter 2022
Python Framework
Apache-spark
PySpark lag() Function
Last Updated: 28 April 2025
The function that allows the user to query on more than one row of a table returning the previous row in the table is known as lag in Python. Apart from returning the offs...
read more
Technical Scripter
Python
Python Programs
Picked
Technical Scripter 2022
Apache-spark
Python-Pyspark
Spark - Split array to separate column
Last Updated: 28 April 2025
Apache Spark is a potent big data processing system that can analyze enormous amounts of data concurrently over distributed computer clusters. PySpark is a Python-based in...
read more
Python
Picked
Python-Library
Apache-spark
Python-Pyspark
Identify corrupted records in a dataset using pyspark
Last Updated: 07 April 2025
There can be datasets that may contain corrupt records. Those records don't follow data-specific rules that are followed by correct records e.g., a corrupt record may have...
read more
Python
Apache Spark
Apache-spark
Python-Pyspark
Spark vs Impala
Last Updated: 01 December 2023
Spark and Impala are the two most common tools used for big data analytics. This article focuses on discussing the pros, cons, and differences between the two tools.What i...
read more
Data Science
Geeks Premier League
Software Testing
Data Analytics
Apache-spark
Geeks Premier League 2023
AI-ML-DS
Top 7 Big Data Frameworks in 2025
Last Updated: 13 November 2024
Big data, or the enormous amount of information that is always expanding, presents both challenges and opportunities for businesses. Robust big-data frameworks are necessa...
read more
GBlog
Picked
BigData
Apache-spark
Frameworks
GBlog 2024
GBlog 2025
Apache Flink vs Apache Spark: Top Differences
Last Updated: 24 April 2025
Apache Flink and Apache Spark are two well-liked competitors in the rapidly growing field of big data, where information flows like a roaring torrent. These distributed pr...
read more
GBlog
Picked
BigData
Apache-spark
GBlog 2024
vs
Wide and Narrow Dependencies in Apache Spark
Last Updated: 03 May 2024
Apache Spark, a powerful distributed computing framework, is designed to process large-scale datasets efficiently across a cluster of machines. However, Dependencies play ...
read more
Picked
BigData
Apache-spark
AI-ML-DS
Data Engineering
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our
Cookie Policy
&
Privacy Policy
Got It !