Senior Machine Learning Software Engineer (MLOps)
Indexed description
Position Description
Calico seeks a Senior Machine Learning Software Engineer (MLOps) to own the scaling, reliability, and deployment of our machine learning workloads. You will build the computational backbone to accelerate, scale, and productionize our ML research, serving as the critical bridge between frontier research and production-grade engineering.
If you are passionate about standardizing ML infrastructure, conquering complex distributed systems challenges, and being the core glue that wires up modern toolchains, this is the role for you.
Please note: No biology or life sciences background is required for this role.
Key Responsibilities
You will work with a team of researchers and engineers on Calico’s key strategic cross-functional drug discovery and development initiatives through the following responsibilities:
- Architect the Computational Engine: Design, build, and deploy highly scalable distributed training environments and evaluate and establish the company-wide blueprint for orchestration systems and infrastructure for therapeutic discovery and development
- Build the Serving Infrastructure: Architect the high-throughput inference and serving pipelines necessary for our models to propose and evaluate synthesis recommendations
- Define the MLOps Standard: Architect and implement rigorous, automated systems for ML experiment tracking, model versioning, checkpointing, and artifact registries, ensuring our path to drug discovery is fully reproducible and data-driven
- Mentorship & Leadership: Serve as a subject matter expert and technical mentor, including partnering with our researchers and academic partners to integrate their research into our production pipelines
- A strong intellectual curiosity for life sciences
- 5+ years of relevant work experience in industry or academia
- Strong software engineering skills with deep expertise in Python and JAX or PyTorch
- Hands-on experience with MLOps practices (e.g., model registries, monitoring, CI/CD for ML pipelines) and modern open-source tools (e.g., Ray, Kubernetes, MLflow, Airflow, Flyte)
- Must be willing to work onsite at least four days a week
- Hands-on experience troubleshooting distributed training bottlenecks
- Prior experience productionizing ML models for biological or molecular data
- Contributions to open-source projects or academic publications in ML
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search