Back to search
NeuroDiscovery AI Linkedin · Posted 1mo ago

Lead DevOps Engineer

India

Linkedin
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

About The Role

We are building an Agentic AI platform for Neurology that processes sensitive clinical data across a multi-tenant, multi-cloud environment serving a large number of clients. Each tenant operates in a hybrid setup — a mix of cloud-hosted services and on-premise appliances deployed at client sites. We're looking for a Lead DevOps Engineer to own the infrastructure, establish production-grade orchestration, harden security across tenant boundaries, and build scalable, repeatable delivery pipelines. This is a hands-on leadership role. You'll shape the DevOps roadmap, enforce engineering discipline across deployments, and mentor a small team — all while keeping a complex hybrid, multi-tenant environment running reliably.

What You'll Do

Multi-Cloud & Multi-Tenant Infrastructure:

  • Design and manage infrastructure across AWS and GCP, ensuring consistent networking, security, and deployment patterns across both clouds.
  • Architect tenant-isolated environments with secure VPC networking — no public-facing IPs, private subnets, VPC peering, endpoints, and VPN connectivity.
  • Build and operate production Kubernetes clusters to host containerized microservices at scale.
  • Define the strategy for which workloads run where — cloud vs. on-premise — based on data sensitivity, latency, and compliance requirements.

CI,CD & Deployment Governance

  • Own and evolve a centralized, modular CI,CD pipeline built on GitHub Actions as the single path to production.
  • Eliminate direct developer access to production environments; implement controlled deployment workflows using session-based access tools (e.g., AWS SSM Session Manager).
  • Establish branch protection, image signing, environment promotion gates, and tenant-aware deployment strategies.

On-Premise Appliance Management

  • Oversee configuration management for client-site appliances using Chef for example in a client-server architecture.
  • Drive the strategy to progressively centralize microservices into cloud-hosted infrastructure, minimizing the on-premise footprint.
  • Define remote access procedures, failure runbooks, and contingency workflows for on-premise hardware.

Security & Compliance

  • Enforce infrastructure security best practices for a healthcare environment handling PHI and de-identified clinical data across tenant boundaries.
  • Manage VPN-based access to private cloud networks and implement least-privilege IAM, secrets management, and policy-as-code across all environments.
  • Ensure tenant data isolation at the network, storage, and compute layers.

Monitoring, Reliability & Backups

  • Build and maintain unified observability using Prometheus and Grafana across cloud and on-premise environments.
  • Own the backup and disaster recovery strategy — container registries, automated snapshots, and cross-cloud resilience.
  • Define and track SLOs for critical data pipelines and tenant-facing services.

Team & Process

  • Mentor junior DevOps,infrastructure engineers and collaborate closely with data engineering, AI, and IT teams.
  • Recommend and help hire for supporting roles (e.g., IT support for on-premise hardware operations).
  • Establish DevOps standards, documentation, and runbooks for the team.

What We're Looking For

Must Have

  • 6+ years of DevOps,Infrastructure,SRE experience, with at least 2 years in a lead or senior capacity.
  • Production experience across AWS and GCP — VPCs, IAM, compute, storage, and managed services on both platforms.
  • Hands-on experience running Kubernetes in production — cluster lifecycle, Helm charts, service mesh, autoscaling, and troubleshooting.
  • Deep expertise in CI,CD design using GitHub Actions (or comparable platforms) with a focus on security and governance.
  • Strong understanding of multi-tenant architecture patterns — network isolation, tenant-aware deployments, and data segregation.
  • Solid Docker and container lifecycle management experience.
  • Infrastructure-as-Code proficiency with Terraform (multi-provider) or equivalent.
  • Networking fundamentals — VPNs, VPCs, DNS, firewalls, load balancers, and zero-trust architectures.
  • Comfort with Python and shell scripting for automation.
  • A production-first, outcome-oriented mindset — you measure success by what's running reliably in production, not by what's in a slide deck. Customer value over story-point velocity.
  • Excellent communication skills — you can translate complex technical concepts for both engineering peers and business stakeholders.

Good to Have

  • 1+ years of DevOps team management experience — you've directly managed devops engineers, run standups, handled performance conversations, and built team culture.
  • Experience with the AI-native stack — vector databases (Pinecone, Weaviate,pgvector), RAG pipelines, feature stores, LLM orchestration frameworks(LangChain, LlamaIndex), and ML pipeline tooling (MLflow, Kubeflow,SageMaker).
  • Experience in healthcare, life sciences, or any environment with strict data privacy requirements (HIPAA, PHI handling).
  • Experience with configuration management tools such as Chef, Ansible, or Puppet.
  • Familiarity with Elasticsearch operations and management.
  • Experience managing hybrid environments with on-premise hardware alongside cloud infrastructure.
  • Exposure to Prometheus, Grafana, and alerting pipeline design.
  • Background working with data engineering teams running ETL,ELT pipelines.
Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent