Machine Learning Engineer
Indexed description
What You Will Do
- Develop AI native applications that extract and process trade compliance data from central EU and member state government sources — both structured (XML, JSON, CSV, HTML tables, APIs) and unstructured (PDF documents, legal text, prose regulations, scanned publications)
- Design and built AI native data pipelines that monitor, detect changes, extract, normalise, validate, and release tariff measures data to clients
- Integrate AI/LLM capabilities into extraction workflows — using large language models for document understanding, entity extraction, classification, and data structuring
- Design and maintain data models for customs duties, VAT/excise information, and tariff measure metadata
- Ensure data quality through automated testing, benchmarking against official sources, data source and legal source comparison, and data validation pipelines
- 3+ years of professional software development
- Proficiency in at least two programming languages — e.g., Python, JavaScript/TypeScript, Java, Go, C#, Rust, Kotlin, or similar. We value strong fundamentals over specific language experience
- Methodology / Framework: Bmad Method
- Data extraction and processing — web scraping, document parsing, API integration, ETL/ELT pipelines. Must be comfortable with both:
- Structured data: XML, JSON, CSV, HTML tables, databases, REST/SOAP APIs
- Unstructured data: PDFs, legal text, prose regulations, HTML without clear structure, scanned documents
- Database design and querying — relational (PostgreSQL or similar) and/or document-based databases. Schema design, migrations, indexing
- API development — building and consuming RESTful APIs
- Version control — Git workflow, branching strategy, code review
- Automated testing — unit, integration, and data validation tests as part of development workflow
- Self-sufficiency – ability to analyse, design and build complete solution
- Rapid development – segment development into phasing to get results faster
- Influencer – demonstrate approach to other team members to lift group maturity
- Communication – can work with technical, business and project team members, participates in and leads discussion
- LLM/AI API integration (OpenAI, Anthropic Claude, Google Gemini) — for data processing, document understanding, or content extraction
- AI-assisted development tools (Claude Code, Cursor, GitHub Copilot, Windsurf, or similar)
- NLP / document processing — OCR, text extraction, entity recognition, text classification
- Graph databases (Neo4j) or vector databases (pgvector)
- Observability and monitoring (SigNoz, Grafana, Langfuse, or similar)
- Containerisation and deployment (Docker, CI/CD pipelines)
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search