Pyspark - Data Architect

Virtusa

Not Interested
Bookmark
الإبلاغ عن هذه الوظيفة

profile موقع الوظيفة:

دبي - الإمارات

profile الراتب شهرياً: لم يكشف
تاريخ النشر: أمس
عدد الوظائف الشاغرة: 1 عدد الوظائف الشاغرة

ملخص الوظيفة

Pyspark JD:ResponsibilitiesData Pipeline Development: Design develop and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform ensuring data integrity and Ingestion: Implement and manage data ingestion processes from a variety of sources (relational databases APIs file systems) to the data lake or data warehouse on Transformation and Processing: Use PySpark to process cleanse and transform large datasets into meaningful formats that support analytical needs and business Optimization: Conduct performance tuning of PySpark code and Cloudera components optimizing resource utilization and reducing runtime of ETL Quality and Validation: Implement data quality checks monitoring and validation routines to ensure data accuracy and reliability throughout the and Orchestration: Automate data workflows using tools like Apache Oozie Airflow or similar orchestration tools within the Cloudera and Maintenance: Monitor pipeline performance troubleshoot issues and perform routine maintenance on the Cloudera Data Platform and associated data : Work closely with other data engineers analysts product managers and other stakeholders to understand data requirements and support various datadriven : Maintain thorough documentation of data engineering processes code and pipeline and ExperienceBachelors or Masters degree in Computer Science Data Engineering Information Systems or a related field.3 years of experience as a Data Engineer with a strong focus on PySpark and the Cloudera Data SkillsPySpark: Advanced proficiency in PySpark including working with RDDs DataFrames and optimization Data Platform: Strong experience with Cloudera Data Platform (CDP) components including Cloudera Manager Hive Impala HDFS and Warehousing: Knowledge of data warehousing concepts ETL best practices and experience with SQLbased tools (Hive Impala).Big Data Technologies: Familiarity with Hadoop Kafka and other distributed computing and Scheduling: Experience with Apache Oozie Airflow or similar orchestration and Automation: Strong scripting skills in SkillsStrong analytical and problemsolving verbal and written communication to work independently and collaboratively in a team to detail and commitment to data quality.
Pyspark JD:ResponsibilitiesData Pipeline Development: Design develop and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform ensuring data integrity and Ingestion: Implement and manage data ingestion processes from a variety of sources (relational databa...
اعرض المزيد view more

المهارات المطلوبة

  • إدارة الأموال
  • الصياغة والتحرير
  • دعم المستخدم النهائي
  • البنية التحتية
  • الخطوط الجوية
  • CATIA

عن الشركة

Virtusa Corporation (NASDAQ: VRTU) is a global information technology (IT) services company providing IT consulting, technology and outsourcing services. Using our enhanced global delivery model, innovative platforming approach and industry expertise, we provide cost-effective service ... اعرض المزيد

عرض صفحة الشركة عرض صفحة الشركة