SWE - Backend Infrastructure Engineer
Indexed description
Responsibilities
- Design and build secure, maintainable, self-serve core infrastructure that engineering teams can rely on and operate independently
- Architect and evolve a modern ML training infrastructure — scalable, reproducible, and built for rapid experimentation
- Build and operate a modern model serving architecture with a focus on reliability, cost efficiency, and low latency
- Own and scale the low-latency voice interface and audio processing pipeline — a technically demanding, performance-sensitive system at the core of Sesame's product
- Build developer tooling, server infrastructure, and data infrastructure that is high leverage and low maintenance — the kind that makes other engineers faster without creating new dependencies on you
- Set technical direction within your domain, bring others along through clear communication and well-reasoned proposals, and raise the engineering bar across the team
- A strong systems thinker who is equally comfortable setting direction and getting hands-on with implementation
- Hands-on reliability engineering experience — you have well-formed convictions about observability, monitoring, deployment systems, and loosely coupled architectures, and you've put them into practice at scale
- Proven track record of shipping services at scale, with all the operational complexity that comes with it
- Kubernetes — significant production experience operating and scaling Kubernetes clusters
- Experience designing and shipping flexible domain models and APIs — you think carefully about boundaries, contracts, and long-term maintainability
- A default toward automation — you've consistently delivered efficiency gains through automation and have the track record to show it
- Strong communication skills — you can set your own direction, write clearly about tradeoffs, and bring engineers and stakeholders along with you
- 3+ years of software engineering experience, with significant time in infrastructure, platform, or ML systems roles
- Infrastructure as Code at scale — significant IaC experience, preferably Terraform; CloudFormation, Pulumi, or Kubernetes-based approaches also welcome. Ideally you've architected, maintained, or contributed to a multi-stack, self-serve IaC system and understand the challenges of building infra that teams can own independently
- ML infrastructure — any combination of the following:
- PyTorch experience, especially model optimization for serving
- ML training or serving experience in general
- Experience building ML serving and/or training infrastructure (TorchServe, Seldon, KServe, Ray Serve, or similar)
- Experience building large-scale distributed training and serving systems
- Data engineering — pipeline design, dataset management, or data platform experience
- Database design — complex schema design, query optimization, and hard data modeling decisions across relational and non-relational stores
- Real-time communication systems — low-latency audio, video, or streaming infrastructure
Full-time Employee Benefits
- 401 (k) max employer match: 3.5% of compensation
- 100% employer-paid health, vision, and dental benefits for you and your dependents
- Unlimited PTO and sick time
- Flexible spending account with employer matching up to $1,650/year (medical FSA)
- Guardian Employee Assistance Program (EAP)
- Opportunity to share in the company's success with competitive stock options
Compensation Range: $175K - $280K
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search