Role: Site Reliability Engineer ()
Location: Hybrid / Remote (UK-based)
Tech Stack: AWS MongoDB Docker CI/CD Prometheus Python
Why This Role
Looking to work at the intersection ofDevOps backend engineering and real-time problem-solving Heres your chance to make a real impact in a high-scale cloud environment keeping production systems fast reliable and resilient for thousands of users.
Youll join a collaborative tech-savvy team dedicated to making thingsjust work better. From improving observability across microservices to responding to high-priority incidents this is your platform to shape how scalable applications are delivered and supported.
What Youll Be Doing
- Fix and improve: Hunt down bugs in live microservices and make production more stable every day.
- Pair up with engineers: Collaborate with dev teams to sharpen code quality boost resilience and embed observability from the start.
- Own the cloud: Configure and manage cloud infrastructure (AWS) keeping everything humming at scale.
- Watch the signals: Build better monitoring and alerting systems to catch issues before they escalate.
- Troubleshoot deeply: Solve complex technical puzzles and help guide others through them.
- Automate everything: Write and maintain SOPs and automation scripts to reduce manual toil.
- Be the calm in the storm: Participate in the on-call rota and take ownership of live issues when they arise.
What Were Looking For
- Solid experiencedebugging live applicationsand resolving production issues fast.
- Background in building and supportingmicroservice-based applications.
- Confidence working withMongoDBAWS services and containerisation tools likeDockerorECS.
- Familiarity withinfrastructure-as-codeand CI/CD pipelines (CloudFormation CodeBuild etc.).
- Comfort usingmonitoring/observability toolslike Prometheus NewRelic Grafana or DataDog.
- Good grasp ofscripting (Python or JS)for automation and tooling.
- Clear thinking in the face of incidentsplus the drive to learn from them.
Bonus Points For
- Knowledge ofREST GraphQL and async messaging systems.
- Experience withGitworkflows andCI/CD pipelines.
- Understanding ofSRE principles(SLIs SLOs error budgets etc.).
- Awareness ofsecurity and compliance(GDPR privacy risk management).
- Clear communicator with a team-first attitude.
Why Youll Love It Here
- Youll work withbrilliant engineerswho care about quality automation and clean code.
- Youll have thefreedom to shape infrastructureas we scale and evolve.
- Youll gain deep exposure tomodern DevOps tooling incident response strategy and production engineering.
- Your voice will matterfrom tech choices to process improvements.
Apply direct or contact