Back to search
roblox Greenhouse · Posted today

Principal Software Engineer, Compute Fleet Management

San Mateo, California, United States

Software Engineering Greenhouse
Continue to application Add your email once, then Caio opens the original posting.

Indexed description

Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators.

At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device. We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there.

A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.

As a Principal Software Engineer leading Fleet Management, you will be the overall technical lead across three pods and the person who sets the technical direction for the fleet management layer of Roblox. This is a hands-on, deeply technical leadership role that owns all of Roblox's compute capacity end to end: from low-level provisioning and the data plane, up through the control planes that operate it, and all the way to the UI and internal-facing products that let teams self-serve capacity. Your org centralizes security, maintenance operations, and the uptime of every Roblox Kubernetes cluster, and governs the internal customer contracts that drive automation across the fleet spanning Roblox data centers and cloud providers. You will guide architecture, raise the engineering bar, and make sure compute capacity supply and demand stay in balance as the fleet grows.

You will:

  • Serve as the overall technical lead for three Fleet Management pods, setting and aligning the technical direction across low-level provisioning, the data plane, and the control plane and product surfaces above them.
  • Architect the declarative, Kubernetes-style control planes that operate Roblox's compute fleet across on-prem and cloud, and define how capacity is provisioned, reconciled, and exposed at scale.
  • Own the design of the internal customer contracts and APIs that govern automation across the fleet, so that every infrastructure team can operate capacity safely and predictably.
  • Drive the strategy for self-serve capacity, including the internal-facing products and UIs that let teams request, manage, and reason about the compute they depend on.
  • Centralize and raise the bar on security, maintenance operations, and the uptime of all Roblox Kubernetes clusters, defining how fleet-wide changes ship reliably without impacting production.
  • Partner broadly with stakeholders inside and outside infrastructure to understand compute needs and drive innovation for our backend services, AI, and edge computing.
  • Write code daily, staying deep in the systems your org owns and leading by example on the hardest design and implementation problems.

You Have:

  • 10+ years of experience building and operating large-scale distributed systems and infrastructure.
  • A track record as the technical anchor an organization relies on, with the leadership to set direction across multiple teams and up-level the engineers around you.
  • Strong proficiency in Go, with deep experience designing and operating production services at fleet scale.
  • Hands-on experience building declarative, Kubernetes-style control planes and the reconciliation patterns behind them.
  • Strong proficiency with gRPC for service-to-service APIs and with SQL and Postgres for durable, high-scale state.
  • Experience operating compute capacity across both on-prem data centers and cloud providers, and a feel for the realities of running fleets at the scale of hundreds of thousands of instances.
  • A history of being highly cross-functional, partnering with stakeholders across and beyond infrastructure to design systems that keep compute supply and demand in balance.

For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits as described on this page.

Annual Salary Range$345,040—$399,420 USD

Roles that are based in an office are onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday (unless otherwise noted).

Roblox provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Roblox also provides reasonable accommodations to candidates with qualifying disabilities or religious beliefs during the recruiting process.

For US based roles only, please note the Company may not be able to employ candidates for this role who have United States work authorization related to certain U.S. visa categories, or support future H-1B sponsorship at this time.

Free. 20 seconds. No password. See every match in this search.

Create a free Caio profile to unlock more results and save your role and location preferences.

Unlock free search
Want help applying to roles like this? Search Caio for free. If the repetitive CV tweaking gets heavy, Daniel can help set up Caio Agent.
Ask about Agent