Title: Senior Site Reliability Engineer
Location: Santa Ana, CA
Position Type: Contract to Hire
Salary/Pay: $65 - $70 per hour
We are unable to sponsor at this time
Our client, a renowned Fortune 500 firm consistently ranked among Fortune 100s best companies to work for, offers exciting career growth prospects, a favorable work-life balance, and a renowned work culture. If youre a Senior Site Reliability Engineer seeking these benefits, this role may be a perfect fit for you. They are seeking to hire a Senior Site Reliability Engineer on a contract-to-hire basis.
As a Senior Site Reliability Engineer, your responsibilities include:
- Monitoring and assessing the availability and health of systems and environments, making recommendations to improve services.
- Developing and implementing monitoring and recovery tools to ensure optimal delivery and resilience.
- Collaborating with partner groups to establish Service Level Objectives, Indicators, and Error Budgets.
- Providing expert operational support and engineering for multiple large-scale distributed software applications.
- Leading the development and implementation of departmental automation processes and procedures.
- Acting as a technical point of contact for internal and external customers, offering guidance and support for application and service delivery.
- Advising development and engineering teams on automation and optimization of service availability, scalability, performance, monitoring, and alerting.
- Participating in technical evaluations and proof of concept programs to evaluate and introduce new technologies and tools.
- Being available for on-call support during off-duty hours on a rotating schedule, including weekends and holidays
Requirements
To be successful in this role, you should have the following:
- Strong understanding of cloud services and architecture, including AWS and Azure.
- Experience with distributed systems (architectures, micro-services, high availability) and proficiency in large-scale enterprise environments.
- Knowledge of container computing, including Docker, Kubernetes, and Service Mesh.
- Ability to build and configure AWS and Azure services, such as LAMBDA and Azure Functions.
- Understanding of proxies and load balancing, including Nginx, HAProxy, and Envoy.
- In terms of monitoring and tools:
- Knowledge of log event aggregation, metric collection, application monitoring, and event handling, including Elastic, SCOM, AppD, Uptrends, AppInsights, and Cloudwatch.
- Strong proficiency in Windows and UNIX/Linux technologies, as well as network triaging, packet loss, and routing.
- Ability to create Service Level Objectives (SLO), Service Level Indicators (SLI), Error Budgeting, and Burn Rates.
- For development:
- Knowledge of "everything as code" methodologies for configuration, infrastructure, and orchestration.
- Familiarity with programming languages, such as .Net, C#, C++, and Python.
- Experience with continuous integration tools, including Chef, Ansible, Jenkins, and Stash/Git.
- Experience with configuration management tools, such as Puppet, Hiera, Terraform, Terragrunt, and Ansible.
- Ability to use scripting languages or other tools for workflow automation.
- Additionally:
- Strong analytical and problem-solving skills to troubleshoot infrastructure issues, potentially across multiple technical disciplines.
- Ability to work effectively as a member of a multi-cultural, multi-location team.
So if you are a Senior Site Reliability Engineer looking for a new role with an outstanding company, apply today!
Benefits
Whats in it for you:
- The opportunity to transition from a contract to a full-time, direct hire position.
- Exciting career growth and development opportunities, including training.
- Opportunities to be acknowledged for your hard work and achievements.
- An award-winning and positive work culture.
- A balanced work-life dynamic.
- Generous and comprehensive benefits for full-time employees.
To be successful in this role, you should have the following: Strong understanding of cloud services and architecture, including AWS and Azure. Experience with distributed systems (architectures, micro-services, high availability) and proficiency in large-scale enterprise environments. Knowledge of container computing, including Docker, Kubernetes, and Service Mesh. Ability to build and configure AWS and Azure services, such as LAMBDA and Azure Functions. Understanding of proxies and load balancing, including Nginx, HAProxy, and Envoy. In terms of monitoring and tools: Knowledge of log event aggregation, metric collection, application monitoring, and event handling, including Elastic, SCOM, AppD, Uptrends, AppInsights, and Cloudwatch. Strong proficiency in Windows and UNIX/Linux technologies, as well as network triaging, packet loss, and routing. Ability to create Service Level Objectives (SLO), Service Level Indicators (SLI), Error Budgeting, and Burn Rates. For development: Knowledge of "everything as code" methodologies for configuration, infrastructure, and orchestration. Familiarity with programming languages, such as .Net, C#, C++, and Python. Experience with continuous integration tools, including Chef, Ansible, Jenkins, and Stash/Git. Experience with configuration management tools, such as Puppet, Hiera, Terraform, Terragrunt, and Ansible. Ability to use scripting languages or other tools for workflow automation. Additionally: Strong analytical and problem-solving skills to troubleshoot infrastructure issues, potentially across multiple technical disciplines. Ability to work effectively as a member of a multi-cultural, multi-location team. So if you are a Senior Site Reliability Engineer looking for a new role with an outstanding company, apply today!