Senior Agentic (AI) Engineer
Indexed description
Responsibilities
- Design and ship multi-step agentic systems (planner/executor, tool-using, multi-agent, human-in-the-loop) for onboarding, underwriting, case review, and continuous monitoring
- Architect agent graphs in LangGraph (or comparable — CrewAI, AutoGen, Claude Agent SDK) with explicit state, durable execution, retries, and safe fallbacks
- Build the retrieval layer powering our agents — chunking, hybrid search, reranking, and grounded citation
- Own the eval stack: golden sets, offline regression suites, LLM-as-judge, online A/B and shadow evals, and red-teaming for jailbreaks, prompt injection, and PII leakage
- Expose agents to production systems via well-typed tools and MCP servers. Treat tool surface area as a product
- Drive production MLOps: deployment, versioning, traffic shaping, cost/latency budgets, tracing, and on-call playbooks for agent incidents
- Partner with security and compliance to keep agents inside SOC 2, GDPR, CCPA, and fair-lending posture — auditability and explainability built in, not bolted on
- Mentor engineers on agent patterns, prompt hygiene, eval discipline, and LLM failure modes
- Technology Stack
- Languages: Python, Node.js, TypeScript
- Agent / LLM frameworks: LangGraph, LangChain, Claude Agent SDK, MCP, OpenAI SDK
- Models: Anthropic Claude, OpenAI, open-weight where appropriate
- Retrieval & Data: PostgreSQL, pgvector, OpenSearch, Kafka, Redshift, Redis
- Infra: AWS, Kubernetes (EKS), ArgoCD, Terraform
- Evals & Observability: LangSmith / Langfuse / Braintrust-style tooling, DataDog
- 5+ years of software engineering experience, with 2+ years building production LLM or agentic systems (not just notebooks or demos)
- Hands-on experience with a modern agent framework (LangGraph strongly preferred) and a track record of shipping agents that run, fail gracefully, and recover
- Strong RAG fundamentals chunking, embeddings, hybrid retrieval, reranking, grounding — and judgment about when RAG isn't the right answer
- Real eval experience golden sets, offline and online evaluations, used to make ship/no-ship calls
- Production MLOps fluency: deployed LLM workloads under real latency, cost, and reliability constraints
- Strong Python; comfortable in TypeScript / Node.js
- Solid systems engineering instincts APIs, async patterns, queues, databases, distributed system failure modes
- Calibrated communicator; thrives in ambiguous, fast-moving environments
- Prior experience in fintech, lending, payments, KYB/KYC, fraud, or AML
- Experience building MCP servers or other structured tool interfaces for LLMs
- Background in classical ML (ranking, scoring, calibration)
- Experience designing explainable / auditable AI workflows for regulated environments
- Open-source contributions to agent frameworks, eval tooling, or retrieval libraries
- AWS depth (EKS, MSK, RDS, S3, Lambda) and IaC with Terraform
- Agent Quality: Measurable improvements in task success rate, grounding accuracy, and hallucination rate on our eval suites
- Production Reliability: Agents you own meet defined SLOs for latency (P90/P99), tool-call success, and cost per task
- Velocity: New agent capabilities go from prototype to production in weeks, without skipping evals or guardrails
- Risk Posture: Zero material incidents tied to prompt injection, PII leakage, or unsafe tool use on agents you own
- Force Multiplier: Patterns, tools, and eval scaffolding you build get adopted across engineering
- Health Care Plan (Medical, Dental & Vision)
- Retirement Plan (401k, IRA)
- Life Insurance
- Flexible Paid Time Off
- 9 paid Holidays
- Family Leave
- Remote
- Hybrid work (for Orlando Associates)
- Free Food & Snacks (Orlando)
- Wellness Resources
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search