Software Engineer, TT-Distributed
Indexed description
As our TT-Distributed Software Engineer, you will develop and optimize distributed software systems that power the most efficient and highest-performing AI and HPC clusters. In this role, you'll work on distributed programming across multiple nodes, utilizing systems programming, inter-node communication, and Tenstorrent’s scalable architectures to advance the state-of-the-art distributed inference and training infrastructure.
This role is hybrid, based out of Santa Clara, CA; Austin, TX; or Toronto, ON.
We welcome candidates at various experience levels for this role. During the interview process, candidates will be assessed for the appropriate level, and offers will align with that level, which may differ from the one in this posting.
Who You Are
- Strong C or C++ engineer with solid foundations in systems programming, operating systems, and distributed systems principles.
- Enthusiastic about distributed computing, including IPC, socket programming, and cluster resource coordination.
- Comfortable reasoning about scalability, fault tolerance, and performance across multi-node environments.
- Curious and first-principles thinker who challenges conventional approaches to distributed system design.
- Motivated to grow into a deep technical expert in large-scale distributed AI infrastructure.
- Architect, implement, and optimize distributed software systems that coordinate computation and communication across clusters of AI accelerators and CPUs.
- Design and build distributed APIs enabling data-parallel and tensor-parallel AI workloads.
- Leverage MPI-based technologies and related frameworks to scale programming models across multiple hosts and compute nodes.
- Develop robust systems using IPC, inter-node sockets, and distributed communication primitives to ensure reliability and high performance.
- Build and maintain testing, debugging, profiling, and monitoring tools for large-scale distributed workloads and collaborate with model and systems teams on cluster bring-up.
- How large-scale distributed inference and training systems are architected across thousands of accelerators.
- Advanced techniques in collective communication, synchronization, and parallel workload distribution.
- The performance characteristics of Tenstorrent hardware in multi-node cluster environments.
- How distributed programming models integrate with compilers, runtimes, and AI frameworks.
- The real-world challenges of deploying, debugging, and scaling next-generation AI clusters.
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search