Software Development Engineer 3
Indexed description
Agility Partners is seeking a qualified Senior Operations / Reliability Engineer to fill an open position with a Fortune 15 company based in the greater Seattle area. This role is an exciting opportunity to support a confidential next-generation device initiative by driving operational stability, telemetry monitoring, release validation, and live issue resolution across prototype hardware and production service environments. The ideal candidate brings a strong background in software engineering, DevOps, SRE, or production operations, along with the ability to independently troubleshoot complex technical issues and improve operational reliability in a fast-paced engineering organization.
Responsibilities
Monitor telemetry, dashboards, alerts, logs, and metrics across services, applications, and prototype devices to identify operational issues, performance degradation, and emerging reliability risks
Support software releases and deployment validation by monitoring rollout health, troubleshooting live issues, validating fixes, and documenting operational readiness and stabilization outcomes
Investigate incidents, gather logs and diagnostics, communicate findings to engineering and product teams, and contribute to improvements in monitoring, alerting, and reliability practices
Provide hands-on support for prototype devices and test environments, including device validation, smoke testing, troubleshooting, documentation, and operational reporting
Qualifications
5+ years of experience in software engineering, DevOps, SRE, production operations, infrastructure, service reliability, or related technical operations roles
3+ years of experience with monitoring systems, telemetry analysis, logging platforms, alerting tools, and live issue troubleshooting
3+ years of experience independently driving technical workstreams, operational improvements, and reliability initiatives in fast-moving environments
Experience supporting software releases, deployments, production validation, and operational readiness activities
Strong troubleshooting and diagnostic skills with the ability to analyze logs, metrics, telemetry, and system behavior to identify root causes and recommend solutions
Familiarity with CI/CD workflows, cloud or hybrid infrastructure environments, incident response processes, and operational best practices
Experience documenting incidents, operational procedures, release observations, and known issues
Strong written and verbal communication skills with the ability to clearly communicate risks, findings, and recommendations to cross-functional teams
Experience with Android/mobile OS environments is highly preferred
Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, or a related technical field, or equivalent practical experience required
*Must be able to be onsite in Redmond, WA for this contract
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search