AI Engineer
Indexed description
Responsibilities
- Agent Engineering & Productionization (40%)
- Prototype, iterate, and productionize domain-aligned agent modules (plans, tool-use, task execution flows) that operate reliably within defined workflows. (Execute)
- Build and maintain versioned agent assets (prompts, policies, tool schemas, configs) with clear change logs and reproducibility. (Execute)
- Optimize agent performance for latency and token efficiency within defined constraints (especially for edge-targeted scenarios when applicable). (Execute)
- Evaluation, Testing & Quality Signals (25%)
- Implement an AI system testing harness for assigned agents: regression suites, golden test sets (where applicable), and comparison reports for prompt/model variants. (Execute)
- Maintain evaluation metadata (test versioning, metrics, correlation IDs) to support traceability and repeatability. (Execute)
- Contribute to safety/quality checks (hallucination, toxicity, policy compliance) as part of evaluation workflows defined by the program. (Execute/Consult)
- Integration with Tools and MCPs (20%)
- Implement or extend MCP clients/connectors for internal data products and approved enterprise apps using standardized interfaces, scopes, and audit patterns. (Execute)
- Validate integration behaviour with sandbox credentials, representative test data, and end-to-end workflow tests with stakeholders. (Execute/Consult)
- Operational Readiness & Collaboration (15%)
- Ensure owned components meet operational readiness expectations: logging/telemetry coverage, runbook notes, basic SLI/SLO alignment for agent health and integration reliability. (Execute/Consult)
- Collaborate with platform and transformation teams to clarify requirements, triage issues, and incorporate feedback from internal/external teams into improvements. (Execute/Consult/Informed)
- Identify and implement small process improvements that increase repeatability (evaluation templates, prompt versioning conventions, integration test scaffolds). (Execute)
Supervision Required: Moderate — receives design review and direction from L09/L10 AI leads for evaluation approach, routing standards, and sensitive integrations.
Complexity of Role: High (for L08) — requires balancing quality/latency, integrating multiple enterprise tools, and ensuring reproducible evaluation under evolving requirements.
Cross-Functional Interactions: Yes — frequent interactions with platform, product/domain, security, SRE/observability, and enterprise app owners.
Qualifications
Minimum Qualifications
- Bachelor’s/Master’s in CS/AI/ML/Data Science (or equivalent experience).
- Hands-on experience building LLM applications (agents/tool-use/prompting) and shipping production code in Python.
- Python engineering with production hygiene (testing, packaging, structured logging)
- Agentic AI frameworks/patterns: LangGraph/LangChain, CrewAI-style orchestration patterns; tool/function calling; prompt versioning
- Evaluation discipline: test sets, regression testing, offline eval metrics, A/B comparisons, failure taxonomy
- Integration engineering: APIs, auth concepts, schema-based tool integration; MCP-style interface implementation preferred
- Observability basics: correlation IDs, error analysis, latency instrumentation
- Cloud familiarity: enough to deploy and validate agents via platform pipelines (not owning infra)
- Ownership: takes components from prototype → tested → production-ready with clear artifacts
- Process improvement mindset: improves repeatability and reduces rework through templates and automation
- Collaboration & customer focus: works effectively with domain teams; builds what improves real workflows
- Adaptability: adjusts quickly to changing model/tool constraints and evolving requirements
- Communication: concise technical updates; can explain agent behaviour and evaluation results to non-experts
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search