Senior Software Developer, AI Networking
Indexed description
What You'll Be Doing
- Developing AI networking communication frameworks and applications running in production on the world’s largest supercomputers and data centers.
- Develop production tools and benchmarks used by multiple teams inside and outside NVIDIA.
- Enable new AI models within our benchmarking infrastructure and deliver insights through end-to-end analysis of large-scale workloads across hardware and software stacks.
- Design and implement automation systems, including large-scale parameter search to identify optimal configurations across complex systems.
- Collaborate closely with networking and hardware teams to co-design new features and software interfaces in a fast-paced, evolving environment.
- B.Sc., M.Sc degree in Computer Science / Software engineering, and 5+ years or equivalent experience.
- Professional Python development experience. We seek individuals who build maintainable, long-lived tools that do not impose a heavy burden on the team in terms of maintenance.
- Solid Linux expertise and passion for working extensively in command-line environments.
- Ability to work across a broad and evolving stack, with a strong drive to learn—from hardware and networking up to large-scale AI systems running across entire clusters
- Knowledge and/or experience with modern AI ecosystem: PyTorch, LLMs, inference and training.
- Familiarity with cluster orchestration systems such as Slurm or Kubernetes.
- Knowledge in MPI and HPC, InfiniBand, Ethernet and Networking.
- Experience in performance optimizations
, , JR2016186
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search