Dev.Pro Linkedin · Posted 2mo ago

Senior AI Platform Engineer - HexCore & Eval Systems - OPS00071

Colombia, Huila, Colombia

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

🟢 At Dev.Pro, we work on projects that impact millions of people around the world — but we know it's the people behind the tech who make it all happen. We truly value what makes each person unique and are building a workplace that's inclusive, friendly, and supportive.

🟢 About this opportunity

We are seeking a Senior AI Platform Engineer to own the core platform layer that powers every EasyBee AI agent in production — from multi-tenant agent configuration and schema architecture, to data pipeline contracts, evaluation harnesses, and customer onboarding automation.

This role sits at the intersection of backend platform engineering, LangGraph-based orchestration, and AI evaluation systems. You won't just build features — you'll own the infrastructure that makes all features possible: the agent orchestration graph, the customer configuration schema, end-to-end conversation logging, automated eval pipelines, and the scripts that deploy new customers in under 30 minutes.

If you love owning systems that other engineers depend on, ship at high velocity across a wide surface area, and take pride in leaving codebases cleaner than you found them — we want to hear from you.

🧩 Key responsibilities and your contribution

Core Platform & Schema Architecture: own and evolve the core platform repository — the central Python package implementing our modular agent architecture across orchestration, tools, state, retrieval, configuration, and extensibility layers. Design and maintain customer configuration schemas including versioning metadata, lineage tracking, and component provenance fields aligned with our IP strategy. Implement backward-compatible schema extensions and ensure all active customer deployments upgrade without breaking changes. Enforce schema validation at all node inputs/outputs to prevent data drift across multi-tenant environments

Multi-Tenancy Architecture: build and maintain cross-client isolation across customer configuration, persistent state, and RAG pipelines. Implement multi-tenant tagging so conversation logs, eval datasets, and agent behaviors remain cleanly separated per customer. Design config-driven deploy parameterization to enable new customer onboarding without code changes — configuration-only deployment model. Ensure all platform changes are backward compatible — no code forking per customer

Data Pipelines & Conversation Logging: own the end-to-end conversation logging system — unified schema, row format, conversation capture, and metadata persistence to PostgreSQL and S3. Maintain and extend knowledge base ingestion pipelines: scraping, embedding, vector DB indexing, and retrieval validation for each customer deployment. Define and freeze data contracts between capture specifications and implementation — so downstream analytics, fine-tuning, and eval all receive consistent, well-structured inputs. Implement multi-tenant data tagging so every logged conversation is attributed to the correct customer, facility, and session

Eval Systems & Quality Gates: own the eval suite end-to-end: scenario design, ground-truth dataset curation, automated scoring (F1, precision, recall), and regression CI gates. Build and maintain LLM simulation test flows — parameterized test scenarios that exercise the agent across reservations, pricing, sizing, escalation, and context retention. Instrument distributed tracing at the LangGraph node level — capturing token usage, latency per node, and score drift across deployments. Implement eval suite parameterization so the same harness works across all customers with minimal configuration. Define and enforce production-ready gates — eval score thresholds that must be met before any agent goes live

Onboarding Automation & Deployment: build and maintain onboarding automation scripts that deploy a new customer in under 30 minutes: configuration templates, KB ingestion, eval suite setup, and run scripts. Own deploy parameterization — all customer-specific values injected via config, never hardcoded. Maintain platform sync across customer repositories — keeping shared platform code consistent without breaking customer-specific deployments. Document and enforce the deployment SOP so any engineer can execute a new deployment without escalation

Reliability & Observability: ensure all platform APIs meet latency targets (P95 < 1.5s for voice path) through profiling, caching, and async optimization. Maintain structured logging at every critical path node — conversation start/end, intent classification, retrieval hits, booking outcomes. Implement CI/CD gates that run eval and schema validation automatically before any merge to the production branch. Contribute to incident diagnosis by maintaining observable, well-logged systems with clear error paths

✅ Is that you?

Education:

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience

Experience:

Must: Proficient or Advance use of agentic workflows for coding in tools like Cursor AI or Claude Code
4+ years building and owning production-grade backend systems in Python
Proven experience owning a core platform or shared infrastructure layer used by multiple teams or customers
Hands-on track record with multi-tenant system design — schema isolation, config-driven parameterization, and deployment automation
Experience building evaluation harnesses for LLM-based systems with quantitative metrics

Tools / Technologies:

Python (advanced): async I/O, FastAPI, Pydantic, pytest, type hinting, data classes
LangGraph: state machines, conditional edges, node composition, shared state management across modular agent layers
PostgreSQL + pgvector: relational schema design, state persistence, multi-tenant data isolation
RAG pipelines: vector DB (Pinecone or equivalent), embedding pipelines, retrieval evaluation
Eval & tracing frameworks: LLM simulation testing, distributed tracing, automated scoring pipelines
GitHub Actions / CI/CD: automated eval gates, schema validation hooks, environment promotion
AWS: EC2, S3, RDS, IAM — production deployment and infrastructure operations
YAML / config-driven deployment: customer configuration templating, parameterized onboarding scripts

Skills:

Strong systems thinking — ability to see how schema decisions in the core platform ripple downstream to eval, logging, onboarding, and customer deployments
Comfort owning wide surface area — this role crosses platform, data, eval, and ops without a narrow specialization
High individual shipping velocity — ability to close multiple GitHub issues per day with clean PRs and minimal back-and-forth
Strong schema discipline — treats data contracts as first-class artifacts, not afterthoughts
Ability to work autonomously with minimal supervision in a fast-moving startup environment
Strong written communication for PR descriptions, Notion documentation, and deployment SOPs

✅ Nice to have

Experience with IP-aware architecture decisions or contributing to software patent documentation
Familiarity with voice agent systems (Twilio, PSTN, LiveKit) and latency-constrained deployments
Experience with multi-model evaluation (comparing models from OpenAI, Anthropic, Mistral) using quantitative benchmarks
Prior work in self-storage, property management, or regulated verticals where data privacy and auditability matter
Experience contributing to a modular / clean architecture codebase across multiple bounded contexts
Prior experience in fast-growing startups where you owned infrastructure other engineers depended on daily

✅ What success looks like

A core platform where every new customer can be deployed in under 30 minutes from a configuration template + KB content — zero custom code per customer
An eval pipeline with automated simulation scenarios and distributed tracing that runs on every PR and blocks deployment when scores drop below threshold
A conversation logging system that captures every production interaction with full metadata, enabling the data strategy and future fine-tuning
A clean, schema-validated platform codebase where new nodes, customers, and capabilities can be added with predictable behavior and no silent regressions
A deployment SOP so reliable that any engineer on the team can onboard a new customer without escalation

🎾 What's working at Dev.Pro like?

✔️ 30 paid days off each year — use them for vacation, holidays, or personal time

✔️ 5 paid sick days, up to 60 days of medical leave, and 6 paid days off for family events like weddings, funerals, or having a baby

✔️ Partially covered health insurance - after probation

✔️ Wellness bonus for gym memberships, sports nutrition, and similar needs

Our next steps:

✅ Submit a CV in English — ✅ Intro call with a Recruiter — ✅ Internal interview — ✅ Client interview — ✅ Offer

Interested? Find out more:

📋How we work

💻 LinkedIn Page

📈 Our website

💻IG Page

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If repetitive applications get heavy, Managed Job Search adds supervised execution for $99/month.

View Managed Job Search

Dev.Pro Company profile preview

Source: Linkedin
Location: Colombia, Huila, Colombia
Compensation: Not listed
Open on Caio: 114 roles

Salary insight

Compensation not indexed

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full-time roles Location flexible matches Linkedin postings

Company stats

Current index details for Dev.Pro, based on roles Caio has indexed from public sources.

114open roles 4sources 7markets Posted yesterdaylatest role