Data Engineer - Pyspark

Virtusa

Not Interested
Bookmark
Report This Job

profile Job Location:

Dubai - UAE

profile Monthly Salary: Not Disclosed
Posted on: Yesterday
Vacancies: 1 Vacancy

Job Summary

About the RoleWe are seeking a highly skilled Data Engineer with deep expertise in PySpark and the Cloudera Data Platform (CDP) to join our data engineering team. As a Data Engineer you will be responsible for designing developing and maintaining scalable data pipelines that ensure high data quality and availability across the organization. This role requires a strong background in big data ecosystems cloudnative tools and advanced data processing ideal candidate has handson experience with data ingestion transformation and optimization on the Cloudera Data Platform along with a proven track record of implementing data engineering best practices. You will work closely with other data engineers to build solutions that drive impactful business Pipeline Development: Design develop and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform ensuring data integrity and Ingestion: Implement and manage data ingestion processes from a variety of sources (e.g. relational databases APIs file systems) to the data lake or data warehouse on Transformation and Processing: Use PySpark to process cleanse and transform large datasets into meaningful formats that support analytical needs and business Optimization: Conduct performance tuning of PySpark code and Cloudera components optimizing resource utilization and reducing runtime of ETL Quality and Validation: Implement data quality checks monitoring and validation routines to ensure data accuracy and reliability throughout the and Orchestration: Automate data workflows using tools like Apache Oozie Airflow or similar orchestration tools within the Cloudera and ExperienceBachelors or Masters degree in Computer Science Data Engineering Information Systems or a related field.3 years of experience as a Data Engineer with a strong focus on PySpark and the Cloudera Data SkillsPySpark: Advanced proficiency in PySpark including working with RDDs DataFrames and optimization Data Platform: Strong experience with Cloudera Data Platform (CDP) components including Cloudera Manager Hive Impala HDFS and Warehousing: Knowledge of data warehousing concepts ETL best practices and experience with SQLbased tools (e.g. Hive Impala).Big Data Technologies: Familiarity with Hadoop Kafka and other distributed computing and Scheduling: Experience with Apache Oozie Airflow or similar orchestration and Automation: Strong scripting skills in Linux.
About the RoleWe are seeking a highly skilled Data Engineer with deep expertise in PySpark and the Cloudera Data Platform (CDP) to join our data engineering team. As a Data Engineer you will be responsible for designing developing and maintaining scalable data pipelines that ensure high data quality...
View more view more

Key Skills

  • Apache Hive
  • S3
  • Hadoop
  • Redshift
  • Spark
  • AWS
  • Apache Pig
  • NoSQL
  • Big Data
  • Data Warehouse
  • Kafka
  • Scala

About Company

Inside every Virtusan is a spirit defined by the drive to explore new frontiers, an intellectual curiosity, a need to challenge the status quo, and the inspiration to mind the greater good–all while impacting the bottom line. It all adds up to a culture of innovation.

View Profile View Profile