Back to search
Gramian Consulting Group Himalayas · Posted 25d ago

AI Evaluation Engineer (Data Analysis & Multi-Agent Systems)

Remote / flexible USD Full time Remote

AI Evaluation Engineer AI Analytics Engineer AI Model Evaluation Specialist Senior AI Analytics Engineer
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

We are looking for an AI Evaluation Engineer to design benchmark tasks that simulate real-world analytical workflows, with a focus on data analysis, multi-agent systems, and verification logic.

Requirements

  • Design and develop multi-agent benchmark tasks focused on complex data analysis workflows
  • Create or curate realistic datasets (CSV, JSON, logs, reports, financial or operational data)
  • Build tasks requiring: Cross-referencing across multiple data sources, Anomaly detection and contradiction identification, Statistical analysis and interpretation
  • Define task decomposition strategies across specialized sub-agents (e.g., financial, technical, operational analysis)
  • Develop verification logic to validate precise analytical outputs (not generic summaries)
  • Implement evaluation pipelines using Python and SQL
  • Create reproducible environments using Docker
  • Analyze task performance and refine for clarity, difficulty, and scoring accuracy

Originally posted on Himalayas

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent