PySpark -Data Engineer
Congregate Technologies
34 LPA
Location: Pune,Mumbai,Kolkata,Chennai,Bangalore,Hyderabad,Gurugram
Posted: June 03, 2026
Posted By: System Administrator
Job Description
Job Title: Data Engineer – PySpark & AWS
Total Years of Experience: 6+ Years.
Location: Pune, Mumbai, Bangalore, Chennai, Hyderabad, Gurugram & Kolkata.
Work Culture: Hybrid (3 Days-office, 2 Days-WFH)
Job Overview:
Pyspark Data Engineer:
Hands-on expertise in designing, building, and maintaining Apache Spark pipelines in production environments.
Proven experience building and scaling data ingestion frameworks that integrate data from multiple source systems, with a focus on reliability, reusability, and scalability.
Deep understanding of Spark architecture (driver/executors, DAG, partitioning, shuffles, caching, cluster resource management) and experience operating pipelines at scale, including data transformations on datasets ~500 GB+.
Strong understanding of Oracle SQL and HDFS, including handling file formats and applying appropriate data cleansing, normalization, and formatting to produce curated output datasets.
Ability to write Python, Pyspark, and shell scripts to process, transform, and automate data workflows. The Candidate should be good in writing application programs and automation manual data processing steps using python.
Key Notes (Important for Candidates):
PySpark expertise is mandatory and will be the primary evaluation criteria
AWS experience should be hands-on but can be secondary
Candidates should be comfortable with a 5-week onboarding process
Total Years of Experience: 6+ Years.
Location: Pune, Mumbai, Bangalore, Chennai, Hyderabad, Gurugram & Kolkata.
Work Culture: Hybrid (3 Days-office, 2 Days-WFH)
Job Overview:
Pyspark Data Engineer:
Hands-on expertise in designing, building, and maintaining Apache Spark pipelines in production environments.
Proven experience building and scaling data ingestion frameworks that integrate data from multiple source systems, with a focus on reliability, reusability, and scalability.
Deep understanding of Spark architecture (driver/executors, DAG, partitioning, shuffles, caching, cluster resource management) and experience operating pipelines at scale, including data transformations on datasets ~500 GB+.
Strong understanding of Oracle SQL and HDFS, including handling file formats and applying appropriate data cleansing, normalization, and formatting to produce curated output datasets.
Ability to write Python, Pyspark, and shell scripts to process, transform, and automate data workflows. The Candidate should be good in writing application programs and automation manual data processing steps using python.
Key Notes (Important for Candidates):
PySpark expertise is mandatory and will be the primary evaluation criteria
AWS experience should be hands-on but can be secondary
Candidates should be comfortable with a 5-week onboarding process
Application Stats
Total Applications: 0
Posted: Jun 03, 2026