AI Infrastructure / Platform Engineer
Indexed description
We are engineers and investors working together to redefine how institutional investment decisions are made — faster, smarter, and more transparent.
The Role
We are looking for an AI Infrastructure / Platform Engineer to work on the foundational systems that power our data science and AI platform.
You will work across the infrastructure layer beneath our ML and AI workflows: data pipelines, orchestration, compute provisioning, model serving, and observability. You will also play a key role in operationalizing our agentic AI platform, ensuring agents are hosted, monitored, and integrated into production-grade systems.
What You’ll Do
Data Pipelines & Orchestration
- Design, build, and maintain production data pipelines that ingest, transform, and deliver structured and unstructured data to downstream ML workflows.
- Own and extend our Prefect-based orchestration layer, including flow scheduling, error handling, retry logic, and human-in-the-loop (HITL) suspend/resume patterns.
- Build and maintain feature stores, data contracts, and promotion workflows that ensure data quality and traceability from raw ingestion through model consumption.
- Collaborate with data scientists to operationalize experimental workflows into reliable, repeatable pipelines.
- Build and maintain scalable infrastructure for model training, retraining, and inference (batch and real-time), including GPU compute provisioning and container orchestration.
- Implement and manage model serving infrastructure — including containerized endpoints, API gateways, and self-serve deployment frameworks for the data science team.
- Deploy and manage monitoring systems that track model health, data drift, prediction consumption, and pipeline reliability.
- Ensure all deployed systems are highly available, resilient, and well-documented with clear data lineage and runbooks.
- Support the buildout and operationalization of agentic AI workflows, including agent hosting, lifecycle management, and integration with Model Context Protocol (MCP) servers.
- Build shared tooling and infrastructure that enables data scientists to develop, test, and deploy agents with minimal friction.
- Design and implement evaluation frameworks and quality standards for AI agents, including automated benchmarking, regression testing, and production-readiness criteria.
- Ensure observability and reliability across agent execution environments, including logging, tracing, and performance monitoring.
- Deploy, configure, and maintain shared AI platform services (e.g., observability tools, memory layers, evaluation platforms) as containerized workloads on Azure — including end-to-end ownership of networking, access, and connectivity between services.
- Manage cloud infrastructure (Azure) including container registries, managed identities, Key Vault secrets, storage backends, and virtual network configurations.
- Maintain CI/CD pipelines, branch protection policies, and release management workflows across data science repositories.
- Continuously evaluate and adopt tools and technologies that improve platform reliability, developer experience, and team velocity.
- 3+ years of experience in data engineering, MLOps, or ML infrastructure roles — with a clear track record of building and maintaining production data and ML pipelines.
- Strong proficiency in Python and SQL, with hands-on experience building ETL/ELT pipelines and data transformation workflows.
- Experience with workflow orchestration tools (Prefect, Airflow, Dagster, or similar) in production environments.
- Solid understanding of containerization and cloud infrastructure — Docker, Kubernetes, and at least one major cloud provider (Azure preferred).
- Hands-on experience deploying and operating containerized services in cloud environments, including configuring networking, load balancing, and service-to-service connectivity.
- Experience with model serving and deployment patterns (batch inference, real-time APIs, feature stores).
- Familiarity with monitoring and observability tooling for pipelines and deployed models (data drift detection, health metrics, alerting).
- Strong documentation habits and the ability to communicate technical architecture clearly to diverse stakeholders.
- Experience with Azure services: Container Apps, ACI, ACR, Blob Storage, Key Vault, Managed Identities, VNets.
- Familiarity with Prefect (especially cloud-managed work pools, result backends, and HITL patterns).
- Experience with dbt, Snowflake, or similar data transformation and warehousing tools.
- Exposure to LLM serving infrastructure and agentic workflow frameworks (e.g., MCP, LangChain, or similar).
- Experience standing up and maintaining third-party AI/ML platform tools (e.g., Langfuse, MLflow, or similar observability and evaluation platforms).
- Experience managing internal Python package distribution (private PyPI, Artifactory, or similar).
- Familiarity with Git-based release management, branch protection, and CI/CD for data science repos.
- Build at the frontier of AI, data, and finance — where infrastructure directly shapes institutional investment decisions.
- Work on greenfield architecture with high autonomy and technical depth.
- Collaborate with a multidisciplinary team of data scientists, engineers, and investors.
- Culture grounded in technical excellence, transparency, and measurable impact.
- Comprehensive health, dental, and vision insurance.
- Retirement savings plan with company match.
- Hybrid/flexible work arrangements and a supportive work environment.
- Demonstrates a strong bias for action and executes quickly with limited guidance.
- Takes full ownership of outcomes and drives problems to resolution.
- Approaches challenges with a solutions-first mindset and delivers measurable results.
- Maintains composure under pressure while keeping momentum and focus.
- Simplifies complex issues into clear, actionable steps that move the work forward.
Compensation Range: $140K - $200K
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search