Back to search
Alignerr Linkedin · Posted 22d ago

Python Insfrastructure Engineer - Model Evaluation

Colombia

Linkedin
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Python Infrastructure Engineer — Model Evaluation (AI Training)

About The Role

What if your Python expertise could directly shape how the world's most advanced AI models are built, tested, and improved? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that sit at the heart of cutting-edge AI development.

This is a fully remote, flexible contract role working alongside leading AI research labs on real production systems. If you're a strong Python engineer who wants to do meaningful, high-impact work at the frontier of AI — this is the role for you.

  • Organization: Alignerr
  • Type: Hourly Contract
  • Location: Remote
  • Commitment: 20–40 hours/week

What You'll Do

  • Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
  • Build and maintain evaluation harnesses that integrate with ML inference frameworks
  • Improve reliability, performance, and safety across existing Python codebases
  • Instrument systems with observability and metrics collection to monitor reliability and model performance
  • Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
  • Collaborate with data, research, and engineering teams to support model training and evaluation workflows
  • Participate in synchronous design reviews to iterate on architecture and implementation decisions

Who You Are

  • Native or fluent English speaker with clear written and verbal communication skills
  • Full-stack developer with a strong systems programming background
  • 3–5+ years of professional experience writing production-grade Python
  • Experienced building evaluation harnesses for ML models and integrating with inference frameworks
  • Solid background in observability, metrics collection, and monitoring for production systems
  • Self-motivated and reliable — able to commit 20–40 hours per week

Nice to Have

  • Prior experience with data annotation, data quality, or evaluation systems
  • Familiarity with AI/ML workflows, model training, or benchmarking pipelines
  • Experience with distributed systems or developer tooling
  • Background in MLOps or AI infrastructure

Why Join Us

  • Work directly on cutting-edge AI projects alongside leading research labs
  • Fully remote and flexible — structure your work week around your life
  • Freelance autonomy with the depth and consistency of meaningful, long-term technical work
  • Make a tangible impact on how next-generation AI models are evaluated and improved
  • Potential for ongoing work and contract extension as new projects launch
Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent