Manager Site Reliability Engineer

Dubai - UAE

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

What youll be doing:

Leading the SRE team setting objectives and guiding the team towards achieving high reliability while balancing cost and performance SLAs.
Collaborating with platform & product engineering teams to embed reliability and operational best practices into the software development lifecycle.
Developing and implementing SRE policies and practices including service level objectives (SLOs) service level indicators (SLIs) and error budgets.
Driving automation across operations to reduce toil improve system performance ensure scalability with a reasonable amount of allergic response towards repetitive manual work.
Overseeing incident management post-mortem analyses and root cause investigations to prevent future outages and enhance system reliability.
Facilitating capacity planning and scalability exercises to manage growth and ensure the efficient use of resources.
Facilitating disaster recovery plans & testing to ensure business continuity for our customers webstores.
Encouraging a culture of continuous improvement by mentoring team members and fostering innovation within the team.
Staying up to date with the latest trends and technologies in SRE and advocating for their adoption where appropriate.

Qualifications :

What youll bring:

Bachelors or Masters degree in Computer Science Engineering or a related technical field.
At least 5 years of experience in Site Reliability Engineering with 2 years in a leadership or management role.
Proven expertise in cloud computing platforms (e.g. AWS Azure GCP) and experience with container orchestration (e.g. Kubernetes).
A deep understanding of network protocols load balancing and high availability configurations.
Experience in applying software development solutions to SRE and familiarity with programming languages such as (preferably) PowerShell and C# or else Python Go Java etc.
Experience with automation tools infrastructure as code (e.g. Terraform Ansible).
Proficiency in monitoring and logging tools (e.g. Prometheus Grafana ELK Stack) and in implementing comprehensive monitoring solutions. Dynatrace knowledge is a plus.
Excellent problem-solving skills with a proven ability to tackle complex issues under pressure.
Outstanding leadership qualities with a track record of mentoring and developing high-performing teams.
Exceptional communication and collaboration skills capable of working effectively with cross-functional teams.

Job descriptions can be tough to interpret. Even if you may not tick all the boxes we have ambitious plans for our Dubai office and we encourage people who share our vision and look forward to growing with us. Apply now.

Additional Information :

#LI-Hybrid

Remote Work :

Employment Type :

Full-time

What youll be doing:Leading the SRE team setting objectives and guiding the team towards achieving high reliability while balancing cost and performance SLAs.Collaborating with platform & product engineering teams to embed reliability and operational best practices into the software development life...

What youll be doing:

Leading the SRE team setting objectives and guiding the team towards achieving high reliability while balancing cost and performance SLAs.
Collaborating with platform & product engineering teams to embed reliability and operational best practices into the software development lifecycle.
Developing and implementing SRE policies and practices including service level objectives (SLOs) service level indicators (SLIs) and error budgets.
Driving automation across operations to reduce toil improve system performance ensure scalability with a reasonable amount of allergic response towards repetitive manual work.
Overseeing incident management post-mortem analyses and root cause investigations to prevent future outages and enhance system reliability.
Facilitating capacity planning and scalability exercises to manage growth and ensure the efficient use of resources.
Facilitating disaster recovery plans & testing to ensure business continuity for our customers webstores.
Encouraging a culture of continuous improvement by mentoring team members and fostering innovation within the team.
Staying up to date with the latest trends and technologies in SRE and advocating for their adoption where appropriate.

Qualifications :

What youll bring:

Bachelors or Masters degree in Computer Science Engineering or a related technical field.
At least 5 years of experience in Site Reliability Engineering with 2 years in a leadership or management role.
Proven expertise in cloud computing platforms (e.g. AWS Azure GCP) and experience with container orchestration (e.g. Kubernetes).
A deep understanding of network protocols load balancing and high availability configurations.
Experience in applying software development solutions to SRE and familiarity with programming languages such as (preferably) PowerShell and C# or else Python Go Java etc.
Experience with automation tools infrastructure as code (e.g. Terraform Ansible).
Proficiency in monitoring and logging tools (e.g. Prometheus Grafana ELK Stack) and in implementing comprehensive monitoring solutions. Dynatrace knowledge is a plus.
Excellent problem-solving skills with a proven ability to tackle complex issues under pressure.
Outstanding leadership qualities with a track record of mentoring and developing high-performing teams.
Exceptional communication and collaboration skills capable of working effectively with cross-functional teams.

Additional Information :

#LI-Hybrid

Remote Work :

Employment Type :

Full-time

Key Skills

Apply Now

About Company

Sana Commerce

Sana Commerce is a ready-to-use commerce platform engineered for B2B. We've paired decades of B2B expertise with smart, integrative software to help our customers easily bring complex processes online, unburden their teams and drive value faster. The result? A supportive, personaliz ... View more

View Profile View Profile

AI AutoApply

Apply to 100+ jobs with one click