Senior Big Data Engineer - Scala, Pyspark
Salary undisclosed
Apply on
Original
Simplified
We are seeking a highly experienced and skilled Senior Big Data Engineer with 10+ years of expertise in Spark (Scala or PySpark) and Hive. The ideal candidate will have a proven track record in the design, development, and performance tuning of Spark applications. This role requires strong programming skills in Java, Scala, or Python and an in-depth understanding of big data processing tools, the Hadoop ecosystem, and distributed systems.
Key Responsibilities:
- Design, develop, and optimize Spark applications for large-scale data processing.
- Implement performance tuning techniques in Spark to handle complex, high-volume datasets.
- Collaborate with cross-functional teams to define and implement big data solutions.
- Work with Hadoop-based technologies to manage, analyze, and process big data.
- Utilize distributed systems to ensure efficient data processing across multiple platforms.
- Develop and manage data pipelines using ETL frameworks like Airflow or Ctrl-M.
- Write efficient and maintainable code in Java, Scala, or Python to process large datasets.
- Integrate RDMS and Data Warehouses with big data systems to enable advanced data analytics.
- Use Unix Shell scripting for automating workflows and data management tasks.
Required Skills and Experience:
- 10+ years of hands-on experience with Spark (Scala or PySpark) and Hive.
- Strong expertise in Java, Scala, or Python programming.
- In-depth knowledge of the Hadoop ecosystem (e.g., HDFS, YARN, MapReduce).
- Familiarity with big data processing tools and frameworks.
- Good understanding of distributed systems and their role in big data processing.
- Working knowledge of RDMS, Data Warehouses, and Unix Shell scripting.
- Experience with Airflow, Ctrl-M, or similar workflow orchestration tools.
- Excellent analytical and problem-solving abilities.
- Strong communication and collaboration skills, with the ability to work effectively in a team environment.
Preferred Qualifications:
- Experience with performance tuning in Spark and distributed systems.
- Familiarity with cloud platforms and big data solutions on AWS, Azure, or Google Cloud.
- Hands-on experience with data orchestration and ETL pipelines.
This version keeps the key details but rephrases and organizes them to present a fresh perspective.
Similar Jobs