Full Stack Developer with SRE Exp(W2)
Indexed description
Job Title: Production Engineer
Location: Plano, TX
Duration: Long Term
About CTC:
Founded in 1996, CTC is a global IT services, Consulting and Business Solutions partner dedicated to helping organizations innovate, optimize, and grow. With over 2,000 professionals worldwide, we support more than 100 clients in transforming complex challenges into lasting competitive advantages.
Job Description:
- 3-4 years of experience in production engineering and site reliability engineering (SRE) to design, implement, and maintain highly available, scalable, and resilient systems.
- Own end-to-end operational responsibilities include monitoring, incident response, root cause analysis, capacity planning, and automation to ensure optimal system performance and reliability in production environments.
- Collaborate cross-functionally with development, QA, and infrastructure teams to streamline CI/CD pipelines, automate deployments, and enforce best practices for security, compliance, and disaster recovery.
- Utilize a broad set of tools and technologies to proactively detect, troubleshoot, and resolve production issues, minimizing downtime and improving service-level objectives (SLOs) and service-level agreements (SLAs).
- Build, deploy, and maintain cloud-native microservices using Java, Spring Boot, and JavaScript frameworks, ensuring high availability and scalability.
- Design and implement RESTful APIs and event-driven architectures using AWS services such as Lambda, ECS/EKS, SQS, and SNS.
- Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, or AWS CodePipeline for automated testing and deployment.
- Monitor application and infrastructure health using AWS CloudWatch, Prometheus, Grafana, and distributed tracing tools like Jaeger or AWS X-Ray.
- Troubleshoot production issues, perform root cause analysis, and implement fixes to improve system reliability.
- Implement security controls including IAM roles, OAuth2, JWT, and encryption for data in transit and at rest.
- Collaborate with cross-functional teams to design fault-tolerant, resilient systems with automated failover and recovery.
- Optimize cloud resource usage and cost through rightsizing and autoscaling configurations.
- Automate operational tasks and incident response using scripting and infrastructure as code (Terraform, CloudFormation).
- Maintain detailed documentation of system architecture, deployment processes, and operational runbooks.
Create a free Caio profile to unlock more results and save your role and location preferences.
Unlock free search