AI / Data Engineer

Business Umbrella

Not Interested
Bookmark
الإبلاغ عن هذه الوظيفة

profile موقع الوظيفة:

أبوظبي - الإمارات

profile الراتب شهرياً: لم يكشف
profile الخبرة المطلوبة: 5سنوات
تاريخ النشر: 12-09-2025
عدد الوظائف الشاغرة: 1 عدد الوظائف الشاغرة

ملخص الوظيفة

Build ingestion pipelines for structured/unstructured data using Python

Clean normalize and prepare data formats suitable for LLM finetuning (e.g. JSONL CSV)

Create highquality taskspecific datasets for training and evaluation

Apply versioning to datasets using DVC or LakeFS for reproducibility

Generate embeddings using HuggingFace or Sentence Transformers

Manage vector indexes (FAISS Weaviate) and optimize retrieval workflows

Tokenize and chunk longform data for context window optimization



Requirements

10 years experience in Data Engineering role

2 years experience in AIadjacent data role

Proficiency in Python pandas and text processing tools

Familiarity with tokenization libraries (HuggingFace Tokenizers SentencePiece)

Experience managing datasets and object storage (MinIO NFS)

Understanding of LLM data constraints (context windows formatting prompt injection)



Build ingestion pipelines for structured/unstructured data using Python Clean normalize and prepare data formats suitable for LLM finetuning (e.g. JSONL CSV) Create highquality taskspecific datasets for training and evaluation Apply versioning to datasets using D...
اعرض المزيد view more

المهارات المطلوبة

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • البيانات الضخمة
  • مستودع البيانات
  • Kafka
  • Scala