Sr. Lead AI Engineer, Data - 11315
Indexed description
Why join Coupa?
馃敼 Pioneering Technology: At Coupa, we're at the forefront of innovation, leveraging the latest technology to empower our customers with greater efficiency and visibility in their spend.馃敼 Collaborative Culture: We value collaboration and teamwork, and our culture is driven by transparency, openness, and a shared commitment to excellence.馃敼 Global Impact: Join a company where your work has a global, measurable impact on our clients, the business, and each other. Learn more on Life at Coupa blog and hear from our employees about their experiences working at Coupa.The Impact of a Sr. Lead AI Engineer, Data at Coupa:
Coupa's data platform already handles anonymized data exports, commodity classification, supplier normalization, and benchmark metrics across 197+ enterprise tables. The Lead AI Engineer, Data will expand this foundation, building the data curation and pipeline infrastructure that feeds our growing AI model training capabilities. This is a high-volume workstream processing trillions of dollars of enterprise spend data.
What You鈥檒l Do
- Lead the design and implementation of data pipelines that prepare high-quality training data for AI models.
- Build data curation workflows that transform raw enterprise data into labeled, validated datasets.
- Design data quality frameworks: validation, profiling, anomaly detection, lineage tracking.
- Extend existing anonymized data export pipelines to support AI training workloads.
- Implement synthetic data generation pipelines.
- Design schema mappings across 197+ enterprise tables for feature extraction.
- Collaborate with ML engineers on training data format requirements.
- Establish data catalog and metadata management for AI training artifacts.
What You Will Bring to Coupa
- 10+ years of software engineering experience, with 5+ years in data engineering.
- Strong experience with Apache Spark / PySpark and large-scale data processing.
- Experience building ETL/ELT pipelines on cloud infrastructure (managed Spark, object storage, managed ETL, or equivalent).
- Knowledge of data quality frameworks and data governance.
- Experience with data anonymization and privacy-preserving data processing.
- Understanding of ML training data requirements.
- Proficiency in Python and SQL.
- Experience with data catalog tools and metadata management.
- BS/MS in Computer Science or equivalent experience.
- Experience in B2B SaaS with multi-tenant data preferred.
Originally posted on Himalayas
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search