Senior AI Platform Engineer - HexCore & Eval Systems - OPS00071
Indexed description
๐ข About this opportunity
We are seeking a Senior AI Platform Engineer to own the core platform layer that powers every EasyBee AI agent in production โ from multi-tenant agent configuration and schema architecture, to data pipeline contracts, evaluation harnesses, and customer onboarding automation.
This role sits at the intersection of backend platform engineering, LangGraph-based orchestration, and AI evaluation systems. You won't just build features โ you'll own the infrastructure that makes all features possible: the agent orchestration graph, the customer configuration schema, end-to-end conversation logging, automated eval pipelines, and the scripts that deploy new customers in under 30 minutes.
If you love owning systems that other engineers depend on, ship at high velocity across a wide surface area, and take pride in leaving codebases cleaner than you found them โ we want to hear from you.
๐งฉ Key responsibilities and your contribution
โ Is that you?
- Education:
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
- Experience:
- Must: Proficient or Advance use of agentic workflows for coding in tools like Cursor AI or Claude Code
- 4+ years building and owning production-grade backend systems in Python
- Proven experience owning a core platform or shared infrastructure layer used by multiple teams or customers
- Hands-on track record with multi-tenant system design โ schema isolation, config-driven parameterization, and deployment automation
- Experience building evaluation harnesses for LLM-based systems with quantitative metrics
- Tools / Technologies:
- Python (advanced): async I/O, FastAPI, Pydantic, pytest, type hinting, data classes
- LangGraph: state machines, conditional edges, node composition, shared state management across modular agent layers
- PostgreSQL + pgvector: relational schema design, state persistence, multi-tenant data isolation
- RAG pipelines: vector DB (Pinecone or equivalent), embedding pipelines, retrieval evaluation
- Eval & tracing frameworks: LLM simulation testing, distributed tracing, automated scoring pipelines
- GitHub Actions / CI/CD: automated eval gates, schema validation hooks, environment promotion
- AWS: EC2, S3, RDS, IAM โ production deployment and infrastructure operations
- YAML / config-driven deployment: customer configuration templating, parameterized onboarding scripts
- Skills:
- Strong systems thinking โ ability to see how schema decisions in the core platform ripple downstream to eval, logging, onboarding, and customer deployments
- Comfort owning wide surface area โ this role crosses platform, data, eval, and ops without a narrow specialization
- High individual shipping velocity โ ability to close multiple GitHub issues per day with clean PRs and minimal back-and-forth
- Strong schema discipline โ treats data contracts as first-class artifacts, not afterthoughts
- Ability to work autonomously with minimal supervision in a fast-moving startup environment
- Strong written communication for PR descriptions, Notion documentation, and deployment SOPs
- Experience with IP-aware architecture decisions or contributing to software patent documentation
- Familiarity with voice agent systems (Twilio, PSTN, LiveKit) and latency-constrained deployments
- Experience with multi-model evaluation (comparing models from OpenAI, Anthropic, Mistral) using quantitative benchmarks
- Prior work in self-storage, property management, or regulated verticals where data privacy and auditability matter
- Experience contributing to a modular / clean architecture codebase across multiple bounded contexts
- Prior experience in fast-growing startups where you owned infrastructure other engineers depended on daily
- A core platform where every new customer can be deployed in under 30 minutes from a configuration template + KB content โ zero custom code per customer
- An eval pipeline with automated simulation scenarios and distributed tracing that runs on every PR and blocks deployment when scores drop below threshold
- A conversation logging system that captures every production interaction with full metadata, enabling the data strategy and future fine-tuning
- A clean, schema-validated platform codebase where new nodes, customers, and capabilities can be added with predictable behavior and no silent regressions
- A deployment SOP so reliable that any engineer on the team can onboard a new customer without escalation
โ๏ธ 5 paid sick days, up to 60 days of medical leave, and 6 paid days off for family events like weddings, funerals, or having a baby
โ๏ธ Partially covered health insurance - after probation
โ๏ธ Wellness bonus for gym memberships, sports nutrition, and similar needs
Our next steps:
โ Submit a CV in English โ โ Intro call with a Recruiter โ โ Internal interview โ โ Client interview โ โ Offer
Interested? Find out more:
๐How we work
๐ป LinkedIn Page
๐ Our website
๐ปIG Page
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search