Back to search
KMC Solutions Inc Himalayas · Posted 5d ago

XTN-EF5F239 | SENIOR DEVOPS ENGINEER -CLOUD & HPC INFRASTRUCTURE

USD Full time Remote

Senior DevOps Engineering Cloud Engineering Infrastructure Engineering
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Position Overview

We are seeking a highly experienced Senior DevOps Engineer to lead the design, deployment, automation, and operational excellence of our AWS-based cloud infrastructure and high-performance computing (HPC) environments. This role requires deep expertise in AWS architecture, Linux systems administration, server deployment, containerization, virtualization, license server management, and cloud networking.

The ideal candidate is hands-on, automation-driven, security-focused, and comfortable operating in complex hybrid environments supporting research, engineering, and compute-intensive workloads.

  • Health Insurance/HMO
  • Enjoy unlimited MadMax Coffee
  • Diverse learning & growth opportunities
  • Accessible Cloud HR platform (Sprout)
  • Above standard leaves

Key Responsibilities

  • Cloud Infrastructure & AWS Architecture
  • Design, deploy, and manage scalable, secure AWS infrastructure.
  • Architect and maintain VPCs, subnets, route tables, NAT gateways, transit gateways, and peering.
  • Manage AWS networking components including Route53, Load Balancers (ALB/NLB), CloudFront, and PrivateLink.
  • Implement infrastructure-as-code (IaC) using Terraform, CloudFormation, or similar.
  • Optimize cloud cost, performance, and resource utilization.
  • Implement AWS best practices for security, resilience, and high availability.

Server Deployment & Systems Engineering

  • Architect and automate server provisioning across cloud and hybrid environments.
  • Deploy and manage EC2, Auto Scaling Groups, Launch Templates, and AMIs.
  • Build hardened Linux server images (CIS benchmarks preferred).
  • Implement configuration management using tools such as Ansible, Puppet, or Chef.
  • Manage patching, lifecycle management, and OS hardening strategies.

Expert Linux Administration

  • Advanced administration of RHEL, Rocky, Ubuntu, or similar distributions.
  • Kernel tuning and performance optimization for compute-intensive workloads.
  • Troubleshooting system-level performance (CPU, memory, I/O, networking).
  • Manage system services, storage, RAID, LVM, NFS, and distributed filesystems.
  • Shell scripting and automation (Bash, Python).

Containerization & Virtualization

  • Design and manage containerized workloads using Docker.
  • Deploy and maintain Kubernetes (EKS preferred).
  • Implement CI/CD pipelines for container-based applications.
  • Manage virtualization platforms (VMware, KVM, or similar).
  • Optimize container orchestration for HPC and compute workloads.

HPC Infrastructure Management

  • Deploy and maintain High Performance Computing clusters.
  • Manage job schedulers (Slurm, PBS, or similar).
  • Optimize cluster performance, storage throughput, and node scaling.
  • Integrate HPC workloads with AWS services (e.g., ParallelCluster).
  • Manage high-speed networking (InfiniBand or equivalent if applicable).
  • Support GPU-based workloads where applicable.

License Server Administration

  • Deploy and manage FlexLM or similar license servers.
  • Ensure high availability and redundancy for engineering license services.
  • Monitor license usage and optimize allocation.
  • Troubleshoot license connectivity and performance issues.

Cloud Networking & Security

  • Deep understanding of TCP/IP, DNS, routing protocols, and firewall design.
  • Implement secure connectivity (VPN, Direct Connect, site-to-site).
  • Manage security groups, NACLs, IAM roles, and zero-trust principles.
  • Implement logging, monitoring, and alerting (CloudWatch, Prometheus, Grafana).
  • Support compliance frameworks and infrastructure security controls.

Automation & CI/CD

  • Build and maintain CI/CD pipelines (GitHub Actions, GitLab, Jenkins, etc.).
  • Automate infrastructure deployments and configuration management.
  • Implement DevSecOps best practices.
  • Develop reusable infrastructure modules and standards.

Monitoring & Observability

  • Implement centralized logging solutions.
  • Configure performance monitoring and alerting systems.
  • Perform root cause analysis and incident response.
  • Develop dashboards and operational metrics.

Required Qualifications

  • 7+ years of experience in DevOps, Infrastructure Engineering, or Systems Engineering.
  • 5+ years of hands-on AWS architecture experience.
  • Deep expertise in Linux systems administration.
  • Strong experience with containerization and Kubernetes.
  • Proven experience managing HPC environments.
  • Experience managing enterprise license servers.
  • Strong scripting skills (Bash, Python).
  • Experience with Infrastructure as Code (Terraform preferred).
  • Strong understanding of networking fundamentals and cloud networking.

Preferred Qualifications

  • AWS Solutions Architect Professional or DevOps Professional certification.
  • Experience with AWS ParallelCluster.
  • Experience with GPU workloads and AI/ML infrastructure.
  • Experience with enterprise storage solutions (NetApp, Isilon, etc.).
  • Experience supporting research or engineering compute environments.
  • Soft Skills
  • Strong troubleshooting and analytical skills.
  • Ability to work independently in high-complexity environments.
  • Clear documentation and communication skills.
  • Experience collaborating across engineering, security, and research teams.
  • Strategic mindset with hands-on execution capability.

What Success Looks Like

  • Highly available, secure, and automated AWS & HPC infrastructure.
  • Optimized cloud costs and compute performance.
  • Reliable license server infrastructure with minimal downtime.
  • Fully automated server deployments.
  • Secure, scalable cloud networking architecture.
  • Improved deployment velocity through CI/CD automation.

Originally posted on Himalayas

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent