DevOps/MLOps Engineer (Cloud & SRE)
Indexed description
What we’re looking for
We are looking for a highly skilled engineer with strong experience in DevOps, MLOps, Kubernetes, Cloud Infrastructure, and SRE to help build and operate scalable ML and cloud-native platforms.
Key skills we expect you to have:
- Kubernetes across major platforms (GKE, EKS, AKS)
- Cloud infrastructure on AWS, GCP, or Azure
- Docker and Linux
- Infrastructure-as-code with Terraform
- Helm and ArgoCD for GitOps-driven deployments
- CI/CD and GitOps practices
- MLOps and ML lifecycle operations
- Observability, reliability, and automation
- Experience with MongoDB and distributed systems
- Scripting/programming with Python, Go, Bash, and/or JavaScript
How you’ll work with us: You’ll partner with multiple technical teams (Data Scientists, Data Engineers, Backend, and Infrastructure) and translate platform requirements into reliable production systems. We value engineers who can iterate fast without compromising reliability, who communicate clearly, and who take ownership of operational outcomes.
Location: We are hiring for a 100% remote role with preference for candidates in Portugal, Spain, or LATAM.
Projects
Sidis AI builds AI-centric business products as services, with SIDIS® as our main offering. SIDIS® is an Artificial Intelligence suite designed to empower Sales, Finance, and Marketing departments—turning data and workflows into measurable business outcomes. We are looking for an experienced engineer to help build and operate scalable ML and cloud-native platforms that power the SIDIS® ecosystem. The work spans multi-cloud and hybrid environments, combining MLOps and SRE practices to ensure production-grade deployments, strong reliability, and automation across the full ML lifecycle.
Job Responsibilities
We will ask you to build, deploy, and operate scalable ML and cloud-native platforms across multi-cloud and hybrid environments.
- Collaborate closely with Data Scientists, Data Engineers, Backend, and Infrastructure teams to deploy production-grade ML systems.
- Design and operate reliable Kubernetes-based platform components that support the SIDIS® ecosystem.
- Own operational excellence using SRE and observability practices to maintain high availability.
- Automate infrastructure and deployments with Terraform, Helm, ArgoCD, and GitOps/CI/CD workflows.
- Support MLOps and ML lifecycle operations, ensuring smooth progression from experimentation to production.
- Operate distributed services and data platforms (including MongoDB and related distributed systems needs).
- Improve reliability, performance, and automation across cloud infrastructure (AWS/GCP/Azure).
We’ll rely on you to deliver robust, repeatable processes that keep ML services dependable at scale. 🌍
What we offer
- 100% Remote 🌍
- Flexible hiring model: Freelance / Contractor / Full-time
- International high-impact AI & ML projects
- Cutting-edge cloud-native infrastructure environment
We’ll support you with a collaborative, cross-team environment to deliver reliable ML platforms that power SIDIS®.
Desirable (nice to have)
- Experience operating end-to-end ML systems in production (training/serving pipelines, model deployment, and monitoring).
- Strong reliability engineering background (incident response, SLO/SLI thinking, performance tuning).
- Deep knowledge of GitOps workflows and deployment promotion strategies across environments.
- Hands-on exposure to multi-cloud operational patterns (cost, networking, security, and consistency).
- Experience with additional observability tooling and automated alerting/diagnostics.
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search