Back to search
GL Global Linkedin · Posted 16d ago

Senior GPU & LLM Infrastructure Engineer (NVIDIA, vLLM, OpenShift AI)

New Caledonia

Linkedin
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Senior GPU & LLM Infrastructure Engineer (NVIDIA, vLLM, OpenShift AI)


Our banking client is building a large-scale private GenAI environment and is seeking experienced engineers to support enterprise-grade on-prem inference platforms powered by NVIDIA H200 GPU clusters and OpenShift AI. This role is focused entirely on high-performance LLM inferencing and runtime optimization - not model training or fine-tuning.


What You’ll Do

  • Optimize large-scale LLM inference performance across NVIDIA GPU environments.
  • Drive runtime efficiency across token generation pipelines, including KV cache and prefill/decode optimization.
  • Deploy and operate modern inference frameworks including vLLM and TensorRT-LLM.
  • Manage GPU throughput, batching strategies, latency tuning, and workload orchestration using RunAI and Kubernetes.
  • Oversee the full Hugging Face model lifecycle including onboarding, deployment, versioning, and retirement.
  • Operate and maintain OpenShift AI as the core enterprise GenAI platform.
  • Support production-grade self-hosted open-source LLM environments, including Llama models.


Experience

  • Strong background in AI infrastructure, GPU platforms, or LLM runtime engineering.
  • Hands-on experience with NVIDIA H200 GPU clusters and large-scale inference optimization.
  • Deep understanding of KV cache management, token serving pipelines, and inference latency optimization.
  • Expertise with OpenShift AI, Kubernetes GPU orchestration, and RunAI.
  • Strong experience with vLLM and TensorRT-LLM in production environments.
  • Proven experience managing Hugging Face model deployment lifecycles.

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.

Unlock free search