Senior Big Data Engineer - Scala, Pyspark

Full Time, onsite
MASSIVUE
Singapore, Singapore

Salary undisclosed

Apply on

Linkedn

Original

Simplified

We are seeking a highly experienced and skilled Senior Big Data Engineer with 10+ years of expertise in Spark (Scala or PySpark) and Hive. The ideal candidate will have a proven track record in the design, development, and performance tuning of Spark applications. This role requires strong programming skills in Java, Scala, or Python and an in-depth understanding of big data processing tools, the Hadoop ecosystem, and distributed systems.

Key Responsibilities:

Design, develop, and optimize Spark applications for large-scale data processing.
Implement performance tuning techniques in Spark to handle complex, high-volume datasets.
Collaborate with cross-functional teams to define and implement big data solutions.
Work with Hadoop-based technologies to manage, analyze, and process big data.
Utilize distributed systems to ensure efficient data processing across multiple platforms.
Develop and manage data pipelines using ETL frameworks like Airflow or Ctrl-M.
Write efficient and maintainable code in Java, Scala, or Python to process large datasets.
Integrate RDMS and Data Warehouses with big data systems to enable advanced data analytics.
Use Unix Shell scripting for automating workflows and data management tasks.

Required Skills and Experience:

10+ years of hands-on experience with Spark (Scala or PySpark) and Hive.
Strong expertise in Java, Scala, or Python programming.
In-depth knowledge of the Hadoop ecosystem (e.g., HDFS, YARN, MapReduce).
Familiarity with big data processing tools and frameworks.
Good understanding of distributed systems and their role in big data processing.
Working knowledge of RDMS, Data Warehouses, and Unix Shell scripting.
Experience with Airflow, Ctrl-M, or similar workflow orchestration tools.
Excellent analytical and problem-solving abilities.
Strong communication and collaboration skills, with the ability to work effectively in a team environment.

Preferred Qualifications:

Experience with performance tuning in Spark and distributed systems.
Familiarity with cloud platforms and big data solutions on AWS, Azure, or Google Cloud.
Hands-on experience with data orchestration and ETL pipelines.