Lead AI Engineer
Indexed description
What you'll do: Lead AI Engineer in the Platforms and Products will…
We are seeking a highly motivated Applied AI Engineer with a strong foundation in Machine Learning and a deep interest in Large Language Models (LLMs) and Generative AI. This role focuses on building, optimizing, and evaluating production-grade LLM systems, including Retrieval-Augmented Generation (RAG), fine-tuning workflows, and scalable inference pipelines.
- Design and implement LLM-powered applications using state-of-the-art transformer models.
- Build and optimize RAG pipelines using embeddings, chunking strategies, and vector search.
- Experiment with prompt engineering, structured outputs (JSON schemas/function calling), and tool-augmented LLMs (agents/workflows).
- Fine-tune models using techniques such as LoRA, PEFT, and instruction tuning.
- Develop and evaluate embedding models for similarity search and semantic retrieval.
- Conduct LLM evaluation using automated and human-in-the-loop techniques (offline + online).
- Optimize inference workflows for latency, GPU utilization, and cost efficiency (quantization, batching, caching).
- Build and maintain REST API Services (FastAPI etc.) to deploy LLM/RAG endpoints, integrate with product systems, and support scalable inference.
- Contribute to integration of AI systems into production software environments (CI/CD, monitoring, reliability).
- Research and prototype cutting-edge approaches in Generative AI and share learnings with the team.
- A master's or bachelor's degree in Computer Science or related field from a top university
- 4+ years' hands-on experience in Machine Learning (ML) with production LLM systems
- Good fundamentals of machine learning, deep learning and fine tuning models (LLM) including:
- Understanding of transformer architectures
- Prompt engineering expertise
- Embeddings and vector search
- Experienced in backend API design with FastAPI, async patterns, rate limiting
- Experience with vector DB including:
- Pinecone, Weaviate, or Chroma
- Embedding storage and similarity search
- Hybrid search implementations
- Strong programming expertise in Python is must including:
- Async programming (asyncio, async/await)
- Type hints and Pydantic
- SOLID principles and design patterns
- Experience in ML Ops to measure and track model performance including:
- MLFlow for model tracking
- Langfuse for LLM observability (strongly preferred)
- Model versioning and A/B testing
- Experience in working with NLP & computer vision
- Fluency in English
- Client-first mentality
- Intense work ethic
- Collaborative spirit and problem-solving approach
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search