Back to search
jobgether Lever · Posted 24d ago

Site Reliability Engineer (SRE)

US Full-time

IT Security & IT Lever
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Site Reliability Engineer (SRE) in United States.

This opportunity is ideal for an experienced reliability engineer who thrives in highly scalable, distributed environments and is passionate about operational excellence. In this role, you will play a critical part in ensuring the stability, scalability, and performance of modern cloud-native systems while collaborating closely with engineering and infrastructure teams. The position offers the chance to work on long-term, high-impact initiatives focused on automation, observability, resilience, and continuous delivery. You will help shape reliability standards, optimize production systems, and reduce operational overhead through engineering-driven solutions. The environment encourages innovation, technical leadership, and proactive problem-solving, making it a strong fit for professionals who enjoy balancing software engineering with systems operations. This is a fully remote opportunity within the United States, offering long-term career growth and exposure to complex, enterprise-grade platforms.

Accountabilities:

    • Define, implement, and continuously improve service reliability standards through SLOs, SLIs, and error budget management for critical production services.
    • Lead incident response efforts, coordinate production issue resolution, and conduct detailed post-incident reviews to strengthen system resilience and operational maturity.
    • Design and maintain observability frameworks using monitoring, logging, and tracing tools such as Prometheus, Grafana, OpenTelemetry, ELK/EFK, or Datadog.
    • Develop automation tools and operational workflows using Python, Go, Bash, or similar technologies to eliminate repetitive manual tasks and improve system efficiency.
    • Architect, manage, and optimize Kubernetes-based infrastructure, including autoscaling, networking, capacity planning, and container orchestration.
    • Build and improve CI/CD pipelines that support safe deployments, automated testing, canary releases, and progressive rollout strategies.
    • Partner with development teams to embed reliability, fault tolerance, and graceful degradation practices early in the software design lifecycle.
    • Drive initiatives related to chaos engineering, performance testing, security hardening, failover readiness, and platform resiliency improvements.
    • Mentor engineers on SRE best practices while contributing to a collaborative, blameless culture focused on continuous improvement.

    Requirements:

      • Bachelor’s degree in Computer Science, Engineering, or a related technical field.
      • 5+ years of professional experience in Site Reliability Engineering, DevOps, production engineering, or infrastructure-focused roles supporting distributed systems.
      • Strong programming and scripting experience with Python, Go, Java, Bash, or similar languages used for automation and tooling development.
      • Deep expertise in Linux systems administration, networking concepts, systems troubleshooting, and performance optimization.
      • Hands-on experience managing Kubernetes clusters and containerized production workloads at scale.
      • Strong understanding of observability practices and modern monitoring ecosystems including Prometheus, Grafana, OpenTelemetry, ELK/EFK, or equivalent platforms.
      • Experience designing and maintaining CI/CD pipelines and deployment automation processes.
      • Solid knowledge of distributed systems concepts, including reliability engineering, failure handling, partitioning, and scalability principles.
      • Proven experience leading incident management processes and conducting actionable post-mortem reviews.
      • Excellent communication, collaboration, and technical documentation skills.
      • Additional exposure to cloud platforms such as AWS, Azure, or GCP, along with service mesh technologies or chaos engineering practices, is highly valued.

      Benefits:

        • 100% remote work opportunity across the Continental United States.
        • Full-time direct W2 employment with long-term project stability.
        • Competitive base salary aligned with experience and technical expertise.
        • Comprehensive employee benefits package, including healthcare coverage and additional employee perks.
        • Opportunity to work on multi-year engineering initiatives involving modern cloud-native technologies and enterprise-scale infrastructure.
        • Supportive environment focused on technical growth, mentorship, and career advancement.
        • Exposure to cutting-edge reliability, automation, and observability practices in a collaborative engineering culture.
        • H1B transfer support available for qualified candidates currently holding valid H1B status.
        • Flexible, remote-first work environment designed to support productivity and work-life balance.
How Jobgether works: We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team. We appreciate your interest and wish you the best! Why Apply Through Jobgether? Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time. #LI-CL1
Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent