AI / Data Engineer

Business Umbrella

Not Interested
Bookmark
Report This Job

profile Job Location:

Abu Dhabi - UAE

profile Monthly Salary: Not Disclosed
profile Experience Required: 5years
Posted on: 12-09-2025
Vacancies: 1 Vacancy

Job Summary

Build ingestion pipelines for structured/unstructured data using Python

Clean normalize and prepare data formats suitable for LLM finetuning (e.g. JSONL CSV)

Create highquality taskspecific datasets for training and evaluation

Apply versioning to datasets using DVC or LakeFS for reproducibility

Generate embeddings using HuggingFace or Sentence Transformers

Manage vector indexes (FAISS Weaviate) and optimize retrieval workflows

Tokenize and chunk longform data for context window optimization



Requirements

10 years experience in Data Engineering role

2 years experience in AIadjacent data role

Proficiency in Python pandas and text processing tools

Familiarity with tokenization libraries (HuggingFace Tokenizers SentencePiece)

Experience managing datasets and object storage (MinIO NFS)

Understanding of LLM data constraints (context windows formatting prompt injection)



Build ingestion pipelines for structured/unstructured data using Python Clean normalize and prepare data formats suitable for LLM finetuning (e.g. JSONL CSV) Create highquality taskspecific datasets for training and evaluation Apply versioning to datasets using D...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala