AI DevOps Engineer (MLOps & Cloud)
Indexed description
Key Responsibilities
- Design, implement, and maintain scalable, secure cloud infrastructure for AI/ML solutions
- Build and manage Infrastructure as Code (IaC) using tools such as Terraform or CloudFormation
- Develop and maintain CI/CD pipelines for AI applications and model deployment
- Support end-to-end MLOps lifecycle (training, versioning, deployment, monitoring)
- Automate deployments using best practices (blue/green, canary releases, rollback strategies)
- Configure monitoring, alerting, and observability (logs, metrics, tracing) for production systems
- Optimize performance, scalability, and cost (compute, storage, GPU usage)
- Use AI tools (e.g. Copilot, ChatGPT, Claude, Cursor) to accelerate scripting, automation, and incident resolution
- Troubleshoot production issues and ensure high system reliability and availability
- Support developers and data scientists on DevOps tooling and best practices
- Implement DevSecOps standards, including security, secrets management, and compliance
- Maintain clear technical documentation for infrastructure and processes
- 5+ years of experience in DevOps, SRE, or platform engineering
- Strong experience in MLOps and deployment of AI/ML solutions in production
- Proficiency with cloud platforms (AWS, Azure, or GCP) and related AI services
- Expertise in Infrastructure as Code (Terraform, CloudFormation, ARM templates)
- Hands-on experience with containerisation and orchestration (Docker, Kubernetes, Helm)
- Strong CI/CD experience (GitHub Actions, GitLab CI, Jenkins, Azure DevOps)
- Familiarity with MLOps tools (MLflow, Kubeflow, Weights & Biases, SageMaker Pipelines)
- Proficiency in scripting and automation (Python, Bash)
- Experience with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog)
- Knowledge of DevSecOps practices and security standards
- Experience with secrets and configuration management (Vault, AWS Secrets Manager)
- Experience optimising AI workloads using GPUs
- Ability to diagnose and resolve issues using AI-assisted tools
- Strong problem-solving and analytical thinking
- Pragmatic approach balancing automation, speed, and reliability
- Attention to detail in infrastructure design and security
- Proactive, with strong ownership and initiative
- Strong collaboration and support mindset
- Comfortable working in fast-paced, agile environments
- Agile, fast-paced innovation environment
- Close collaboration with AI engineers, data scientists, and infrastructure teams
- Strong focus on automation, scalability, and production-ready AI systems
- English-speaking environment (fluency required)
Recruitment fraud is a scheme in which fictitious job opportunities are offered to job seekers typically through online services, such as false websites, or through unsolicited emails claiming to be from the company. These emails may request recipients to provide personal information or to make payments as part of their illegitimate recruiting process. DXC does not make offers of employment via social media networks and DXC never asks for any money or payments from applicants at any point in the recruitment process, nor ask a job seeker to purchase IT or other equipment on our behalf. More information on employment scams is available here.
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search