AI Platform Dev Ops Engineer
Indexed description
- Cloud & Platform Infrastructure (IaC): Deploy and maintain scalable and secure Cloud workspaces for AI enablement on AWS infrastructure (VPC, PrivateLink, IAM, S3, Lambda, EKS, and Fargate) using Terraform
- Architecture guidance: Provide architectural oversight and technical guidance to engineering teams
- Unity Catalog Implementation: Automate the governance layer, including metastore configuration, external locations, and access controls within Unity Catalog.
- Security & Compliance: Ensure the platform adheres to enterprise security standards by managing implementing cloud infrastructure and data protection automated security controls.
- Workspace Lifecycle Management: Use Terraform for end-to-end workspace provisioning, ensuring consistent setup across Dev, Acc, and Prod environments.
- Governance & Cost Control (Policies): Design and implement policies and guardrails to enforce standards
- Identity & Access Automation: Automate assignment of permissions using Terraform. Manage Service Principals for pipelines and map groups to specific Workspace roles and Unity Catalog grants.
- DevOps & Automation (CI/CD)
- Pipeline Architecture: Oversee GitLab CI/CD pipelines for data and GenAI projects with automated workflows.
- Release Management: Implement branching strategies, code review policies, and environment promotion rules (Dev → Acc → Prod).
- Service Organization & Operations
- Observability: Configure monitoring, alerting, and logging (using system tables or integration with tools like CloudWatch) to ensure platform stability.
- Support & Incident Management: Serve as an escalation point for platform-related incidents.
- Knowledge Sharing: Document best practices and conduct workshops to upskill data engineers on effective platform usage.
- FinOps: Optimize resource allocation, scalability, and cost-efficiency. Job Qualifications:
- Bachelor’s in computer science, software engineering, mathematics, or related field.
- Core cloud competencies, cross-platform technical skills, and ability to think strategically.
- Ability to convert business needs into technical solutions using the AWS Well-Architected Framework best practices.
- Expertise in Cloud networking; Security and Compliance; Automation and DevOps (CI/CD, Jenkins, Gitlab etc.); Containerization (Docker) and Orchestration (Amazon EKS); Monitoring and Logging; and Cloud infrastructure Optimization for optimal FinOps.
- 5+ years industry experience in Data Engineering, Cloud Infrastructure, or DevOps; 3+ years with AWS in enterprise settings.
- Advanced Terraform skills for managing Cloud infrastructure and AWS resources
- Extensive AWS portfolio knowledge.
- Excellent communication, documentation, mentoring, and collaboration skills.
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search