Senior Infrastructure Engineer (Backend/Data Performance)
Indexed description
In this role you will
- Design and optimize high-performance data pipelines for distributed training and storage (using tools like Arrow, DuckDB, LanceDB, BigQuery, vector databases).
- Focus on low-level optimizations (latency, throughput, reliability, GPU usage).
- Build monitoring and visualization tools for tracking data quality, pipeline performance, and experiments.
- Optimize distributed AI workloads for reliability, latency, and efficiency.
- Scope and supervise projects so that interns, PhD students, and post-docs can contribute and collaborate effectively.
- Support recruiting efforts and help shape the growth of the infrastructure team.
Your background looks something like
- 5+ years of backend or infrastructure engineering experience
- Strong Python programming skills (bonus points for lower-level languages)
- Experience with distributed systems and cloud platforms (AWS, GCP, Azure)
- Hands-on experience with containerization (Docker, Kubernetes) and infrastructure as code (Terraform)
- Experience building or supporting ML/AI infrastructure in production
- Experience with high-performance data tools (DuckDB, Apache Spark, Delta Lake)
- GPU orchestration and large-scale model training experienceFamiliarity with ML platforms (SageMaker, Vertex AI) and frameworks (PyTorch, JAX)
- Experience mentoring junior engineers, interns, or researchers and breaking down complex projects into manageable tasks
- Experience participating in technical hiring processes and evaluating candidates
It would be even better if you
- Have deep knowledge of training architectures, CUDA programming, or TPU optimization
- Have Full-stack development experience with frameworks like
- React for building web applications
- Experience managing HPC infrastructure with tools like Slurm or Kubernetes clusters
- Background in monitoring stacks (Prometheus, Grafana) for ML pipeline observability
About the hiring process
- Initial interview: 30-minute discussion to align on experience and expectations
- Technical screening: Two interviews and a take home exercise covering coding and system design
- Panel interview: Assess team alignment
- Final interview: Conversation with our Chief Scientist
Originally posted on Himalayas
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search