ML Model Serving Engineer
Indexed description
Responsibilities
- Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models.
- Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category.
- Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving.
- Work with the training team to identify opportunities to produce faster models without sacrificing quality.
- Use techniques like in-flight batching, caching, and custom kernels to speed up inference.
- Find ways to reduce model initialization times without sacrificing quality.
- Expert in some differentiable array computing framework, preferably PyTorch.
- Expert in optimizing machine learning models for serving reliably at high throughput, with low latency.
- Significant systems programming experience; ex. Experience working on high-performance server systems—you’d be just as comfortable with the internals of VLLM as you would with a complex PyTorch codebase.
- Significant performance engineering experience; ex. Bottleneck analysis in high-scale server systems or profiling low-level systems code.
- Always up to date on the latest techniques for model serving optimization.
- Familiarity with high-performance LLM serving; ex. experience with VLLM, SGlang deployment, and internals.
- Experience with a public cloud platform such as GCP, AWS, or Azure.
- Experience deploying and scaling inference workloads in the cloud using Kubernetes, Ray, etc.
- You like to ship and have a track record of leading complex multi-month projects without assistance.
- You’re excited to learn new things and work in a multitude of roles.
Full-time Employee Benefits
- 401 (k) max employer match: 3.5% of compensation
- 100% employer-paid health, vision, and dental benefits for you and your dependents
- Unlimited PTO and sick time
- Flexible spending account with employer matching up to $1,650/year (medical FSA)
- Guardian Employee Assistance Program (EAP)
- Opportunity to share in the company's success with competitive stock options
Compensation Range: $175K - $280K
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search