Job Summary:
We are seeking a skilled and proactive Data Engineer with strong experience in PySpark, Scala, and Kubernetes. The ideal candidate will be responsible for building and maintaining scalable data pipelines.
Key Responsibilities:
- Design, develop, and maintain data pipelines using PySpark and Scala.
- Optimize Spark jobs for performance and scalability.
- Deploy and manage data workflows in containerized environments using Kubernetes.
- Work closely with data scientists, analysts, and business stakeholders to understand data needs.
- Ensure data quality, integrity, and security across all pipelines.
- Participate in code reviews, design discussions, and performance tuning.
- Monitor, troubleshoot, and resolve production issues in a timely manner.
Technical Skills Required:
Languages: Strong in PySpark and Scala
Big Data: Apache Spark, Hadoop ecosystem
Orchestration: Airflow / Argo / Prefect (optional but nice to have)
DevOps: Git, Jenkins, Helm, CI/CD pipelines
Databases: SQL and NoSQL databases
Job Summary:
We are seeking a skilled and proactive Data Engineer with strong experience in PySpark, Scala, and Kubernetes. The ideal candidate will be responsible for building and maintaining scalable data pipelines.
Key Responsibilities:
- Design, develop, and maintain data pipelines using PySpark and Scala.
- Optimize Spark jobs for performance and scalability.
- Deploy and manage data workflows in containerized environments using Kubernetes.
- Work closely with data scientists, analysts, and business stakeholders to understand data needs.
- Ensure data quality, integrity, and security across all pipelines.
- Participate in code reviews, design discussions, and performance tuning.
- Monitor, troubleshoot, and resolve production issues in a timely manner.
Technical Skills Required:
Languages: Strong in PySpark and Scala
Big Data: Apache Spark, Hadoop ecosystem
Orchestration: Airflow / Argo / Prefect (optional but nice to have)
DevOps: Git, Jenkins, Helm, CI/CD pipelines
Databases: SQL and NoSQL databases