Senior Site Reliability Engineer
Indexed description
ENSEK builds the cloud‑native SaaS software that’s transforming how energy retailers operate, innovate and manage at scale. We help retailers lower operating costs, improve billing accuracy for consumers, and enhance customer experience through automation and AI‑driven insight, all underpinned by modern, cloud‑native architecture.
ENSEK is at an exciting inflection point as we scale at pace towards new international horizons. If you’re driven by solving complex, real‑world problems and want to build reliable, resilient infrastructure that accelerates the global energy transition, you’ll feel right at home with us.
About the roleAs we transition to a truly product‑led organisation, SRE becomes the pulse of engineering — the centre of excellence for reliability, monitoring, and observability.
You’ll shape our new foundational platform, harden its resilience, and ensure it consistently meets the expectations of our customers. You’ll automate away manual toil, streamline operations, and build the systems and tooling that allow engineering teams to move faster with confidence.
Alongside the new platform, you’ll also take ownership of our existing estate — tuning, optimising, and evolving it to modern standards. This is a hands‑on role embedded deeply in the engineering community, operatingwith a product mindset and delivering value just like any other high‑performing team
Key responsibilities:Implementing best practices for monitoring, alerting, and incident response using DataDog and other tools.
Designing, building, and maintaining cost-effective, reliable, and scalable AWS infrastructure.
Collaborating with cross-functional teams to identify and address performance bottlenecks and reliability issues.
Conducting post-incident reviewsto analyse root causes and implement preventive measures.
Automating routine tasks and processes to improve efficiency and reduce manual intervention.
Participating in an on-call rotation to respond to system outages and emergencies.
Stable, observable platform: Services meet agreed SLAs/SLOs with clear dashboards, playbooks and automated remediation where appropriate.
Reduced incident impact:Measurable reductions in MTTD/MTTR and clear evidence of prevention from RCA actions.
Broad adoption of SRE practices:Cross‑team improvements in reliability, testing and operational readiness guided by SRE principles.
Monitoring and observability best practiceincluding using tools like Datadog, Prometheus, Grafana
Expertise in setting up and managing alerts, dashboards, and logging
Understanding of networking concepts, security best practices, and performance optimization in AWS.
Proficiency in AWS services:EKS, EC2, ECS, S3, RDS, VPC, IAM, Route 53, etc. · Experience with containerization and orchestration tools like Docker and Kubernetes.
Strong knowledge of Infrastructure as Code(IaC) toolssuch as Terraform, CDK or CloudFormation
Knowledge of scripting and automation using languages like Python, Bash, or PowerShell.
Experience with CI/CD pipelinesfor deploying and testing applications in AWS.
Bonus pointsif you have any additional experience that includes working with AWS ECS, exposure to .NET CDK for infrastructure provisioning, or a good understanding of cloud cost optimisation and FinOps practices.
Benefits25 days’ holiday + bank holidays
Option to buy or sell 5 extra annual leave days per year
Vitality Health Insurance, including private healthcare, virtual GP access, mental‑health support and wellbeing perks (50% off gym memberships-Virgin Active, Nuffield, PureGym)
Pensionwith5% matched contribution
Regular team‑wide and company‑wide events
2 volunteering days per year to give back
Remote‑first working environment with offices in London and Nottingham
Originally posted on Himalayas
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search