Machine Learning Engineer I
Indexed description
- Evaluate AI-generated responses across support use cases (ticket replies, summaries, intent labels, agent recommendations) using structured rubrics for correctness, relevance, tone, safety, and reasoning.
- Apply and refine standardized evaluation rubrics and scoring guidelines; train and onboard annotators when needed to ensure consistent assessments.
- Write clear, actionable feedback on model outputs that calls out errors, hallucinations, missing context, bias, or policy risks, and suggests concrete remediation steps.
- Create and maintain gold‑standard evaluation datasets, test cases, and regression suites used for model validation and release gating.
- Partner directly with ML engineers, researchers, product managers, and UX teams to refine evaluation criteria, define metrics, and translate findings into product priorities and model improvements.
- Analyze evaluation results to identify systemic failure modes and trends across model versions, produce dashboards/reports, and recommend monitoring or retraining strategies.
- Strong attention to detail and the ability to spot subtle errors, inconsistencies, and unsafe outputs in natural language model responses.
- Clear, empathetic written communication; able to summarize issues and propose improvement steps for multiple stakeholders (engineers, researchers, PMs).
- Practical experience with annotation, dataset curation, or building evaluation suites for ML or NLP systems.
- Familiarity with customer support workflows and the nuance of agent vs. end‑user messaging (preferred but not required).
- A collaborative mindset: you enjoy partnering cross‑functionally to move findings into action and care about measurement and reproducibility.
- Comfortable using Python and common ML tooling to run evaluations, analyze results, and maintain dataset artifacts.
- BS in Computer Science, Data Science, Statistics, Computational Linguistics or equivalent practical experience.
- 2–5 years industry experience in ML/NLP evaluation, data annotation, or QA for language systems (internships and similar experience count).
- Proficient in Python and data manipulation tools (pandas, numpy); experience with Git and basic data pipeline concepts.
- Demonstrated ability to write clear, reproducible evaluation criteria and give concise, actionable feedback.
- Experience working with large language models (LLMs), prompt engineering, or fine‑tuning workflows.
- Hands‑on experience building and maintaining gold‑standard datasets, synthetic test cases, or adversarial examples.
- Familiarity with automated evaluation frameworks, A/B testing, and model monitoring/metrics.
- Prior experience in customer support or SaaS product domains, understanding domain-specific safety/tone constraints.
- Advanced degree (MS/PhD) in a relevant field or equivalent research experience is a plus.
Hybrid: In this role, our hybrid experience is designed at the team level to give you a rich onsite experience packed with connection, collaboration, learning, and celebration - while also giving you flexibility to work remotely for part of the week. This role must attend our local office for part of the week. The specific in-office schedule is to be determined by the hiring manager.
The Intelligent Heart Of Customer Experience
Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.
Zendesk believes in offering our people a fulfilling and inclusive experience. Our hybrid way of working, enables us to purposefully come together in person, at one of our many Zendesk offices around the world, to connect, collaborate and learn whilst also giving our people the flexibility to work remotely for part of the week.
As part of our commitment to fairness and transparency, we inform all applicants that artificial intelligence (AI) or automated decision systems may be used to screen or evaluate applications for this position, in accordance with Company guidelines and applicable law.
Zendesk is an equal opportunity employer, and we’re proud of our ongoing efforts to foster global diversity, equity, & inclusion in the workplace. Individuals seeking employment and employees at Zendesk are considered without regard to race, color, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, ancestry, disability, military or veteran status, or any other characteristic protected by applicable law. We are an AA/EEO/Veterans/Disabled employer. If you are based in the United States and would like more information about your EEO rights under the law, please click here.
Zendesk endeavors to make reasonable accommodations for applicants with disabilities and disabled veterans pursuant to applicable federal and state law. If you are an individual with a disability and require a reasonable accommodation to submit this application, complete any pre-employment testing, or otherwise participate in the employee selection process, please send an e-mail to [email protected] with your specific accommodation request.
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search