Site Reliability Engineer
Indexed description
We are seeking an experienced Site Reliability Engineer (SRE) to join our Group Technology Team in Milton Keynes.
ConnellsX is Connells Group Technology’s internal developer platform, built on Microsoft Azure. It simplifies cloud hosting, embeds security and compliance by default, and enables a frictionless developer experience. As part of the team building and operating this platform, you will play a hands-on role in ensuring it is reliable, scalable, and observable.
You will help establish and mature SRE practices, focusing on:
- Monitoring and observability
- Incident response
- Post-incident review
- Reliability testing and capacity planning
- Toil reduction
- Enabling development velocity
We offer a hybrid working arrangement with one day per week in our Milton Keynes office.
Key Responsibilities:
- Support teams using ConnellsX and respond to incidents in a structured, blameless way
- Investigate root causes and drive post-incident actions to completion
- Define SLIs, contribute to SLOs, and monitor error budgets
- Build dashboards, alerts, and runbooks to improve visibility
- Automate repetitive tasks to reduce operational toil
- Collaborate with cross-functional teams to enhance reliability and observability
- Support performance testing and capacity planning
- Proactively identify and prioritise reliability improvements
Experience & Skills Required:
- Hands-on experience with Azure Monitoring (Application Insights, Alerts, Action Groups)
- Strong knowledge of OpenTelemetry (including Kubernetes)
- Scripting/automation using PowerShell and/or Azure CLI
- Experience with Terraform and GitHub Actions
- Ability to define SLIs/SLOs and manage error budgets
- Incident response and post-incident review experience
- Familiarity with Docker and Kubernetes
- Strong communication and documentation skills
Desirable:
- Working knowledge of .NET/C# and React/NextJS
- Experience with cloud cost optimisation
- Knowledge of Azure networking (DNS, VNets, Firewalls)
- Understanding of security frameworks (e.g. ISO 27002, NIST CSF)
- Azure certifications
About You:
You may come from SRE, DevOps, platform engineering, or operations backgrounds. What matters is hands-on experience running production systems, managing incidents, creating runbooks and automating repetitive work. The focus is on identifying root causes and systemic issues, reducing manual toil through automation, and maintaining reliability by applying SRE principles and using data-driven metrics (SLIs/SLOs).
You understand reliability is about balance, not perfection, and can make data-driven trade-offs between stability and delivery. You are curious, collaborative, and take shared responsibility for system reliability.
Please note that we are unable to provide visa sponsorship. Applicants must have the right to work in the UK.
Connells Group UK is an equal opportunities employer and positively encourages applications from suitably qualified and eligible candidates regardless of sex, race, disability, age, sexual orientation, transgender status, religion or belief, marital status, or pregnancy and maternity.
#connellsgroup
CF00864
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search