Responsibilities
1. Design, develop, and optimize the data warehouse architecture to support large-scale e-commerce data processing, ensuring high performance, reliability, and scalability.
2. Build and maintain ETL pipelines to efficiently extract, transform, and load (ETL) data from various sources, ensuring data accuracy and consistency.
3. Collaborate with engineering, analytics, and business teams to define data requirements and deliver solutions that support business intelligence (BI), reporting, and real-time analytics.
4. Implement and manage data modeling, storage strategies, and indexing techniques to optimize query performance for large datasets.
5. Ensure data governance, security, and compliance, including access control, encryption, and data anonymization where necessary.
6. Monitor and troubleshoot data pipeline failures, ensuring minimal downtime and data integrity.
7. Research and apply new data technologies, frameworks, and best practices to improve scalability, cost efficiency, and processing speed.
8. Support the implementation of real-time and batch data processing solutions to drive machine learning, recommendation systems, and personalization in e-commerce.
Qualifications
1. Bachelor’s Degree or higher in Computer Science, Data Engineering, or a related field, with strong foundations in data structures and algorithms.
2. Proficiency in SQL and experience with large-scale relational and NoSQL databases (e.g., MySQL, PostgreSQL, ClickHouse, Snowflake, Redshift, BigQuery).
3. Hands-on experience with ETL development, data warehousing best practices, and data modeling techniques (e.g., Star Schema, Snowflake Schema).
4. Experience with big data processing frameworks such as Apache Spark, Flink, Hive, or Presto.
5. Proficiency in Python, Java, or Scala for data engineering and automation.
6. Strong knowledge of cloud data platforms (AWS, GCP, or Azure) and experience with data lake architectures (S3, Delta Lake, Iceberg).
7. Familiarity with workflow orchestration tools like Apache Airflow, Prefect, or Dagster.
8. Experience with real-time data streaming technologies (Kafka, Kinesis, Pulsar) is a plus.
9. Ability to troubleshoot data quality and performance issues in a high-traffic, high-volume environment.
10. Strong problem-solving skills, ability to work in a fast-paced environment, and good communication skills to collaborate across teams.
Responsibilities
1. Design, develop, and optimize the data warehouse architecture to support large-scale e-commerce data processing, ensuring high performance, reliability, and scalability.
2. Build and maintain ETL pipelines to efficiently extract, transform, and load (ETL) data from various sources, ensuring data accuracy and consistency.
3. Collaborate with engineering, analytics, and business teams to define data requirements and deliver solutions that support business intelligence (BI), reporting, and real-time analytics.
4. Implement and manage data modeling, storage strategies, and indexing techniques to optimize query performance for large datasets.
5. Ensure data governance, security, and compliance, including access control, encryption, and data anonymization where necessary.
6. Monitor and troubleshoot data pipeline failures, ensuring minimal downtime and data integrity.
7. Research and apply new data technologies, frameworks, and best practices to improve scalability, cost efficiency, and processing speed.
8. Support the implementation of real-time and batch data processing solutions to drive machine learning, recommendation systems, and personalization in e-commerce.
Qualifications
1. Bachelor’s Degree or higher in Computer Science, Data Engineering, or a related field, with strong foundations in data structures and algorithms.
2. Proficiency in SQL and experience with large-scale relational and NoSQL databases (e.g., MySQL, PostgreSQL, ClickHouse, Snowflake, Redshift, BigQuery).
3. Hands-on experience with ETL development, data warehousing best practices, and data modeling techniques (e.g., Star Schema, Snowflake Schema).
4. Experience with big data processing frameworks such as Apache Spark, Flink, Hive, or Presto.
5. Proficiency in Python, Java, or Scala for data engineering and automation.
6. Strong knowledge of cloud data platforms (AWS, GCP, or Azure) and experience with data lake architectures (S3, Delta Lake, Iceberg).
7. Familiarity with workflow orchestration tools like Apache Airflow, Prefect, or Dagster.
8. Experience with real-time data streaming technologies (Kafka, Kinesis, Pulsar) is a plus.
9. Ability to troubleshoot data quality and performance issues in a high-traffic, high-volume environment.
10. Strong problem-solving skills, ability to work in a fast-paced environment, and good communication skills to collaborate across teams.