Senior Software Engineer, AI Training & Infrastructure
Indexed description
Location: San Mateo, CA
Employment Type: On-site, Remote
Company: Skild AI, Inc.
Key Responsibilities
- Architecting, building, and maintaining distributed training pipelines and frameworks.
- Optimizing training performance and resource utilization.
- Integrating state-of-the-art ML techniques into production training systems.
- Implementing monitoring, logging, alerting, automated testing, and CI/CD for reliable training operations.
- Developing developer tooling and documentation.
- Master's degree (or foreign equivalent) in Computer Science, Robotics, Engineering, or a related field.
- 2 years of experience in machine learning infrastructure.
- 2 years of experience designing and operating distributed training pipelines at scale.
- Experience with Python or C++ and at least one deep learning library (e.g., PyTorch, TensorFlow, JAX).
- Experience with CI/CD and automated testing for ML/infra services.
- Knowledge of optimizing data loading and I/O for deep learning workloads.
- Knowledge of processing multimodal datasets and formats, and image processing/compression.
- Experience with cloud-based training (AWS, Google Cloud, or Azure).
- Experience implementing monitoring, logging, and alerting for training systems.
- Knowledge of Linux OS fundamentals, distributed systems, and ML training techniques/models.
- Solid understanding of core software engineering principles.
AI Tools, Chatbots & Virtual Assistants
Software
Text & Instant Messaging
Apply online at skild.ai/career
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search