Site Reliability Engineer (SRE)
Indexed description
Key Responsibilities
- Maintain high availability, performance, and reliability of production systems
- Participate in on-call rotation; troubleshoot incidents and perform root cause analysis
- Work within Agile/Scrum teams (sprint planning, stand-ups, retrospectives)
- Build and support data pipelines (batch and/or real-time)
- Develop and maintain CI/CD pipelines to improve deployment efficiency
- Automate operational tasks and improve system observability
- 8+ years in SRE, DevOps, or Production Engineering
- Strong experience with production support and incident management
- Hands-on experience with CI/CD tools (Jenkins, GitHub Actions, etc.)
- Experience with data pipelines (Airflow, Spark, Kafka, etc.)
- Familiarity with cloud platforms (AWS, Azure, or Google Cloud Platform)
- Experience working in Agile/Scrum environments
- Experience in financial services or investment management
- Monitoring/observability tools (Datadog, Prometheus, Grafana)
- Infrastructure as Code (Terraform, CloudFormation)
Duration: Long Term
Location: SFO, CA or Boston, MA
Work Arrangement: 2-3 days Hybrid
Create a free Caio profile to unlock the full index and keep your job-search signal for future recommendations.
Unlock free search