Back to search
NextGenPros Inc Linkedin · Posted 24d ago

Machine Learning Engineer

New York City, New York, United States

Linkedin
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Requirements:

· 8+ years of experience building production ML inference systems.

· Demonstrated ownership of deep-learning inference optimization in production (quantization, distillation, compilation, kernel/profile-level performance work) for transformer NLP and/or CV models.

· Experience with both TensorFlow (SavedModel, tf.data, XLA, TFLite) and PyTorch (TorchScript, ONNX, FastAPI/TorchServe)

· Hands-on experience optimizing inference pipelines on AWS infrastructure, ideally across different types of media assets.

· Experience with video frameworks/tools (e.g., FFmpeg), and working with large-scale frame-level inference.

· Demonstrated experience monitoring and debugging model latency, memory, and pipeline throughput.

· Experience with hybrid search architectures (BM25 + vector search + cross-encoder reranking).

· Familiarity with OpenAI APIs or other foundation model providers.

· Familiarity with open source HuggingFace LLMs.

· Experience with data pipeline and workflow orchestration tools (e.g., Airflow)

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent