Epicareer Might not Working Properly
Learn More

Senior Big Data Engineer - Scala, Pyspark

Salary undisclosed

Apply on


Original
Simplified

We are seeking a highly experienced and skilled Senior Big Data Engineer with 10+ years of expertise in Spark (Scala or PySpark) and Hive. The ideal candidate will have a proven track record in the design, development, and performance tuning of Spark applications. This role requires strong programming skills in Java, Scala, or Python and an in-depth understanding of big data processing tools, the Hadoop ecosystem, and distributed systems.

Key Responsibilities:

  • Design, develop, and optimize Spark applications for large-scale data processing.
  • Implement performance tuning techniques in Spark to handle complex, high-volume datasets.
  • Collaborate with cross-functional teams to define and implement big data solutions.
  • Work with Hadoop-based technologies to manage, analyze, and process big data.
  • Utilize distributed systems to ensure efficient data processing across multiple platforms.
  • Develop and manage data pipelines using ETL frameworks like Airflow or Ctrl-M.
  • Write efficient and maintainable code in Java, Scala, or Python to process large datasets.
  • Integrate RDMS and Data Warehouses with big data systems to enable advanced data analytics.
  • Use Unix Shell scripting for automating workflows and data management tasks.

Required Skills and Experience:

  • 10+ years of hands-on experience with Spark (Scala or PySpark) and Hive.
  • Strong expertise in Java, Scala, or Python programming.
  • In-depth knowledge of the Hadoop ecosystem (e.g., HDFS, YARN, MapReduce).
  • Familiarity with big data processing tools and frameworks.
  • Good understanding of distributed systems and their role in big data processing.
  • Working knowledge of RDMS, Data Warehouses, and Unix Shell scripting.
  • Experience with Airflow, Ctrl-M, or similar workflow orchestration tools.
  • Excellent analytical and problem-solving abilities.
  • Strong communication and collaboration skills, with the ability to work effectively in a team environment.

Preferred Qualifications:

  • Experience with performance tuning in Spark and distributed systems.
  • Familiarity with cloud platforms and big data solutions on AWS, Azure, or Google Cloud.
  • Hands-on experience with data orchestration and ETL pipelines.

This version keeps the key details but rephrases and organizes them to present a fresh perspective.