Google
Linkedin · Posted 29d ago
Software Engineer III, AI/ML, Google Cloud
Continue to application
Add your email once, then Caio opens the original posting.
Indexed description
Minimum qualifications:
- Bachelor's degree or equivalent practical experience.
- 2 years of experience with software development in one or more programming languages (e.g., Python).
- 2 years of experience with software development in one or more programming languages, or 1 year of experience with an advanced degree.
- 1 year of experience with ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging).
- 1 year of experience with GenAI concepts (Large Language Model, Multi-Modal, Large Vision Models) and experience with text, image, video, or audio generation.
- Master’s degree or PhD in Computer Science or a related technical field
- Experience with Generative AI, Large Language Models (LLM), or Machine Learning infrastructure, including model deployment, performance optimization, profiling, and debugging large-scale workloads.
- Experience with distributed computing leveraging Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs).
- Ability to collaborate effectively with cross-functional teams.
- Ability to thrive in a changing environment where AI technologies are continuously advancing.
Responsibilities
- Enable and optimize foundational models (e.g., LLMs and Diffusion) within key frameworks like vLLM, MaxText, and MaxDiffusion, providing Google Cloud customers with immediate access to AI capabilities.
- Partner with customers to measure Artificial Intelligence/Machine Learning (AI/ML) model performance on Google Cloud infrastructure. Identify and resolve technical bottlenecks to drive customer success working with Customer Engineers teams.
- Collaborate with internal infrastructure teams to enhance support for demanding AI workloads. Contribute to product improvement by identifying bugs and recommending enhancements.
- Conduct performance profiling, debugging, and troubleshooting of training and inference workloads. . Maintain and update documentation and educational content based on product changes and user feedback. Triage, debug, and resolve system issues by analyzing root causes and operational impact.
- Design and implement specialized Machine Leaning solutions leveraging advanced ML infrastructure.
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search
Want help applying to roles like this?
Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent