GPU Senior Software Engineer
Indexed description
Job Summary
We are seeking a skilled software engineer to join our NPU software stack development team. This role involves developing high-performance GPU programming frameworks, runtime systems, and libraries for AI/ML workloads. You will be responsible for implementing, optimizing, and maintaining GPU software stack components to support distributed AI training and inference.
Key Responsibilities
- Identify bottlenecks, analysis and optimize in distributed NPU eco-system
- Design and develop NPU memory management system
- Design and develop optimized NPU development framework, execution path and debugging
- Develop compatibility with AI frameworks (Triton, PyTorch, JAX)
- Write high-quality, well-tested code with comprehensive documentation
- Collaborate with other teams (Hardware, Network, QA, AI Framework Integration)
- Participate in code reviews and technical design discussions
- 5+ years of experience in distributed system programming
- 3+ years of experience with NPU programming (Triton, CUDA, HIP, OpenCL)
- Expert-level C/C++ programming with focus on performance optimization
- Expert-level Python programming with focus on DL/ML frameworks (PyTorch/JAX/etc)
- Deep understanding of NPU architecture, memory tiering, and programming models
- Knowledge of NPU runtime systems
- Experience with performance profiling and optimization tools
- Strong problem-solving and debugging skills
- Experience with version control systems, Ticking system and collaborative development
- Team player with excellent communication skills
- Fast learner, highly organized, detail-oriented with high motivation
- Experience with NPU software stack development
- Experience with large-scale NPU systems (100+ NPUs)
- Experience with DL/ML workloads (oriented AI) and distributed training / inferencing
- Familiarity with containerization and orchestration
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search