Data Engineer
تاريخ النشر:
نُشرت قبل 2 ساعة
عدد الوظائف الشاغرة:
1 عدد الوظائف الشاغرة
ملخص الوظيفة
Role Summary
Build robust observable data pipelines that power research and production AI. Success means high pipeline reliability (on-time SLAs) strong data quality (validation & lineage) and enabling fast experimentation. You will partner with AI/ML analytics and product to make data trustworthy and usable.
Responsibilities
- Architect and operate batch/stream pipelines (Airflow; Spark optional) for ETL/ELT.
- Model/manage schemas; enforce data quality and lineage/governance.
- Support ML workflows with DVC (data versioning) and MLflow or Weights & Biases.
- Build feature stores/data services; expose datasets via secure REST endpoints.
- Optimize performance/cost across storage/compute; implement monitoring/alerting.
- Maintain documentation and internal catalogs; enable self-service analytics.
Qualifications
- Skills: Programming in C or Java; SQL & NoSQL; Pandas/NumPy; PySpark; Airflow; API development; Docker.
- MLOps: DVC; MLflow or W&B; model packaging/deployment fundamentals.
- Cloud: AWS SageMaker Azure ML or GCP AI experience.
- Nice to have: Unreal Engine exposure.
- Environment: Solid Linux background for development and deployment.
- Education/Experience: Proven experience building reliable pipelines in production
المهارات المطلوبة
- Apache Hive
- S3
- Hadoop
- Redshift
- Spark
- AWS
- Apache Pig
- NoSQL
- البيانات الضخمة
- مستودع البيانات
- Kafka
- Scala