KSA INC
Linkedin · Posted 1mo ago
C++Developer || AI || HPC || Bangalore
Continue to application
Add your email once, then Caio opens the original posting.
Indexed description
We are seeking an experienced C++ AI Inference Engineer to design, optimize, and deploy high-performance AI inference engines using modern C++ and processor-specific optimizations. You will collaborate with research teams to productionize cutting-edge AI model architectures for CPU-based inference.
Key Responsibilities
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Collaborate with research teams to understand AI model architectures and requirements
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Design and implement AI model inference pipelines using C++17/20 and SIMD intrinsics (AVX2/AVX-512)
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Optimize cache hierarchy, NUMA-aware memory allocation, and matrix multiplication (GEMM) kernels
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Develop operator fusion techniques and CPU inference engines for production workloads
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Write production-grade, thread-safe C++ code with comprehensive unit testing
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Profile and debug performance using Linux tools (perf, VTune, flamegraphs)
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Conduct code reviews and ensure compliance with coding standards
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Stay current with HPC, OpenMP, and modern C++ best practices
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Modern C++ (C++17/20) with smart pointers, coroutines, and concepts
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0">
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Cache optimization - L1/L2/L3 prefetching and locality awareness
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> NUMA-aware programming for multi-socket systems
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> GEMM/blocked matrix multiplication kernel implementation
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> OpenMP 5.0+ for parallel computing
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Linux performance profiling (perf, valgrind, sanitizers)
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> High-performance AI inference engine development
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Operator fusion and kernel fusion techniques
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> HPC (High-Performance Computing) experience
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Memory management and allocation optimization
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Bachelor's/Master's in Computer Science, Electrical Engineering, or related field
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> 3-7+ years proven C++ development experience
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Linux/Unix expertise with strong debugging skills
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Familiarity with Linear Algebra, numerical methods, and performance analysis
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Experience with multi-threading, concurrency, and memory management
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Strong problem-solving and analytical abilities
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Knowledge of PyTorch/TensorFlow C++ backends
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Real-time systems or embedded systems background
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> ARM SVE, RISC-V vector extensions, or Intel ISPC experience
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Production-grade AI inference libraries powering LLMs and vision models
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> CPU-optimized inference pipelines for sub-millisecond latency
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Cross-platform deployment across Intel Xeon, AMD EPYC, and ARM architectures
- p]:pt-0 [&>p]:mb-2 [&>p]:my-0"> Performance optimizations reducing inference costs by 3-5x
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search
Want help applying to roles like this?
Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent