REAL Linkedin · Posted 2mo ago

Senior AI Engineer - AI Systems Evaluation Team

Israel

Continue to application Add your email once, then Caio opens the original posting.

Indexed description

REAL is building an AI Execution Platform for real estate organizations.

Today, the data required to run real estate is scattered across fragmented systems, leading to missed insights and preventable financial leakage.

REAL transforms this complexity into connected intelligence and automated execution, enabling enterprises to operate with greater precision and confidence.

REAL Values

Ownership: We take responsibility and move decisively
Clarity: We simplify complexity to deliver meaningful impact
Accuracy: Precision matters in everything we build
Velocity: We work with urgency and intent
Partnership: We collaborate closely with customers and teammates

Role Overview

Own the systems that define, measure, and enforce AI quality at REAL
Translate ambiguous model behavior into measurable signals, automated tests, and release gates
Operate across evaluation design, tooling, and production integration

What You'll Do

Design evaluation architectures (benchmarks, regression suites, coverage)
Build automated pipelines to run and score evals across models and prompts
Implement scoring systems (LLM-as-judge, rubrics, hybrid approaches)
Create and maintain golden datasets + edge-case suites
Develop internal tools for prompt testing, dataset generation, experiment tracking
Instrument systems for traces, outputs, and debugging
Detect regressions and enforce quality gates in CI/CD
Monitor model performance in production
Close the loop between eval insights and product improvements

Requirements

What We're Looking For

3-6 years building production software, internal platforms, ML/data infrastructure, experimentation systems, or AI tooling
Strong backend and systems engineering fundamentals with hands-on applied AI experience
Strong Python, production-level systems experience
Built testing frameworks or validation systems end-to-end
Hands-on with LLMs / RAG / agent workflows
Understands eval methods (benchmarking, A/B, LLM-as-judge, HITL)
Experience with observability / logging / experiment tracking
Strong systems thinking (coverage, reliability, reproducibility)
Comfort with non-deterministic systems

Nice to Have

Experience with eval, tracing, observability, or experimentation tooling (one or more of the following: LangSmith, Braintrust, Phoenix, MLflow, OpenTelemetry, PostHog, custom eval stacks)
Familiarity with dataset/versioning workflows, HITL systems, and production AI observability systems
CI/CD integration for model evaluation
Background in search, retrieval, or document systems
Built internal platforms or developer tools
Experience working in startups and business driven environments

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search

Want help applying to roles like this? Search Caio for free. If repetitive applications get heavy, Managed Job Search adds supervised execution for $99/month.

View Managed Job Search

REAL Company profile preview

Source: Linkedin
Location: Israel
Compensation: Not listed
Open on Caio: 17 roles

Salary insight

Compensation not indexed

Caio highlights salary ranges whenever the original posting exposes them. Compare similar roles as the index fills in.

Similar role details

Full-time roles Location flexible matches Linkedin postings

Company stats

Current index details for REAL, based on roles Caio has indexed from public sources.

17open roles 4sources 5markets Posted 13d agolatest role