Senior AI Data Engineer
Indexed description
Key Responsibilities
- User Data Warehouse Construction & Architecture
- Design, build, and maintain a scalable User Data Warehouse to consolidate data from fragmented sources
- Design efficient data models to support high-performance querying and analytics
- Implement ETL/ELT pipelines to ensure real-time or near-real-time data availability and quality
- Data Tagging & Profile System (User 360)
- Establish a comprehensive User Tagging/Labeling System (User Portrait)
- Develop algorithms to generate static, behavioral, and predictive tags to accurately segment users
- Ensure the tagging system is dynamic and can update in real-time to reflect the latest user interactions
- LLM Integration & Data Intelligence
- Lead the integration of Large Language Models with our internal data
- Design and implement RAG (Retrieval-Augmented Generation) pipelines to feed structured user profile data and tags into LLMs for personalized outputs
- Intelligent Interaction Development
- Develop APIs and middleware that allow downstream applications to interact with data using natural language
- Optimize the "Data-to-AI" loop: ensure the LLM understands the context of the user data to provide accurate, hallucination-free responses
- Monitor token usage, latency, and response quality of the AI interactions
- Education: Master's degree in Computer Science, Data Engineering, Artificial Intelligence, or a related field
- Experience: 3-5+ years of experience in Data Engineering or Backend Development with a focus on data
- Data Stack:
- Proficiency in SQL and Python/Java/Scala
- Hands-on experience with Data Warehouses (e.g. Snowflake, BigQuery, ClickHouse) and Big Data frameworks (Spark, Flink)
- Familiar with message middleware (Kafka) and containerization (Docker)
- User Data Experience: Proven experience in building CDP (Customer Data Platform), DMP, or User Profile/Tagging systems
- AI/LLM Skills:
- Experience interacting with LLM APIs (OpenAI, etc.) and inference optimization (vLLM)
- Familiarity with frameworks like LangChain, LlamaIndex, or Haystack
- Understanding of Embedding, vector databases (FAISS, Milvus), and RAG architecture
- Soft Skills: Strong problem-solving abilities and the ability to translate business needs into technical data requirements
- Experience with Prompt Engineering and optimizing context windows for efficient data feeding
- Knowledge of Knowledge Graphs (Neo4j, NebulaGraph) and how to combine them with LLMs
- Experience in model fine-tuning (SFT, RLHF)
- Familiarity with privacy regulations (GDPR/CCPA) regarding user data and AI
- Experience with mature launched projects serving a large user base on cloud platforms (AWS, etc.)
The US base salary range for this full-time position is $100,000-$300,000 + bonus + long term incentives benefits. Our salary ranges are determined by role, level, and location.
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search