Softgic Himalayas · Posted 2d ago

Agent Quality / Evals Engineer 1754

Full time Remote

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

This is a remote position.

Owns the eval harness and quality gate from the beginning. This role replaces the old late-stage “Evals Specialist” model with a standing owner for measurable agent quality. Key Responsibilities

Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs.
Wire evals into CI so quality regressions fail builds and releases.
Define and maintain release-gate thresholds with Product and the Tech Lead.
Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope.

Requirements

Must-Have Qualifications

Experience evaluating ML, LLM, or non-deterministic systems.
Strong test and benchmark design capability.
Comfort working with noisy metrics, thresholds, and probabilistic behavior.
Good scripting and automation skills.

AI-First Expectations

Uses AI to generate candidate eval cases and failure hypotheses, but never confuses generated tests with validated quality.
Approaches AI quality as an operating system, not a QA afterthought.

What Success Looks Like in the First 90 Days

The first reference agent has a published scorecard and gated eval path. • Golden and exception tests run automatically. • The team can explain what “good enough to ship” means in measurable terms.

Originally posted on Himalayas

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If CV tailoring and application tracking get heavy, Full Caio Agent adds a human specialist.

View Full Agent

Softgic Company profile preview

Source: Himalayas
Location
Compensation: Not listed
Open on Caio: 22 roles

Salary insight

Compensation not indexed

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full time roles Remote matches Himalayas postings

Company stats

Current index details for Softgic, based on roles Caio has indexed from public sources.

22open roles 5sources 2markets Posted 2d agolatest role