Back to search
turbalance Linkedin · Posted 17d ago

AI Trace Generation Engineer

Heidelberg, Baden-Württemberg, Germany

Linkedin
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Your mission

  • Design and implement a trace collection system for distributed LLM workloads, capturing compute operations, communication primitives, memory usage, and cluster topology across multi-GPU and multi-node setups
  • Validate that collected traces accurately reflect real workload behavior - verifying operation completeness, timing consistency, and data integrity across inference and training pipelines
  • Integrate with and instrument major LLM frameworks (vLLM, TensorRT-LLM, DeepSpeed, Megatron-LM and others) to extract meaningful execution data without disrupting performance
  • Use collected traces as input to discrete event simulations that model and replay distributed AI workload behavior at scale
  • Analyze trace data to surface bottlenecks and inefficiencies across the stack, from individual kernel execution to cluster-wide communication patterns

Your profile

  • 3+ years of experience in AI systems, ML infrastructure, or a closely related area
  • Hands-on experience with at least one major LLM serving or training framework
  • Strong proficiency in Python and C++, with a solid understanding of GPU architecture, memory bandwidth, and the difference between compute-bound and memory-bound operations
  • Solid understanding of distributed communication
  • Familiarity with parallelism strategies and how they shape execution behavior across large clusters
  • Open source contributions or published research in relevant areas will definitely be appreciated!
  • Previous startup experience is a plus - we move fast and value people who are comfortable with that

Why us?

  • Build something big: Help build and scale a fast-growing AI infrastructure startup
  • Pay & perks: Competitive compensation with a performance-based incentive, subsidized Deutschlandticket, and access to a discount portal
  • Work your way: Flexible hours with hybrid and remote-friendly options
  • Fast lanes, no red tape: Flat hierarchies and rapid decision-making mean ideas ship quickly
  • Global team: Work with a diverse, international team across Germany and the USA
  • Modern headquarters: Well-stocked office near the Heidelberg Hauptbahnhof, available on a hybrid basis or as a place to connect during our quarterly team workshops
  • Top setup: Your choice of high-quality hardware and equipment
  • Relocation support: We’ll help make your move to join us as smooth as possible

About Us

turbalance is an innovative, emerging startup that transforms AI laws. We are a team of passionate problem-solvers who believe in what we’re building. We constantly push boundaries and embrace our inner nerds as we find new ways to tackle complex challenges. You will find a dynamic work environment here, with flat or even non-existent hierarchies and the chance to take on responsibility from day one.

Apply for this job

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent