Site Reliability Engineer
Spectraforce
Seattle, Washington
an hour ago
Job Description
Job Title: Sr. Systems Reliability Engineer
Location: Seattle, WA
Duration: 12 Months CTH
Key Responsibilities:
Job-Specific Skills, Experience & Education
At SPECTRAFORCE, we are committed to maintaining a workplace that ensures fair compensation and wage transparency in adherence with all applicable state and local laws. This position’s starting pay is: $ 55.00/hr.
Location: Seattle, WA
Duration: 12 Months CTH
Key Responsibilities:
- Contribute to the SRE strategy and establish best practices for release management, automation, and system reliability.
- Mentor and guide SRE, Engineering, and Product teams in adopting core SRE principles such as service ownership, reducing toil, and continuous improvement.
- Lead initiatives across SLIs/SLOs, observability, incident management, and postmortem practices, ensuring insights and learnings are captured and acted upon.
- Champion SRE practices by implementing repeatable templates for logging, monitoring, and alerting frameworks.
- Drive observability and monitoring excellence using tools such as Grafana, AppDynamics (AppD), and Sumo Logic, ensuring proactive detection and resolution of issues.
- Partner with engineering to design reliable, fault-tolerant systems and reduce operational toil through automation.
- Implement and leverage the Ansible Automation Platform to help teams automate infrastructure provisioning, configuration management, and event-driven workflows.
- Enable teams to automate operational events and infrastructure changes, reducing manual intervention and improving system resilience.
- Exercise sound judgment to ensure operational compliance with security, privacy, audit, disaster recovery, and other company requirements.
Job-Specific Skills, Experience & Education
- Minimum of 5 years of experience in Site Reliability Engineering, IT operations, or related fields.
- Bachelor’s degree in computer science, engineering, or equivalent experience (2 additional years in lieu of degree).
- Technical expertise in system reliability, scalability, application design, and performance.
- Hands-on experience with observability and monitoring tools such as Grafana, AppDynamics, and Sumo Logic.
- Experience with automation platforms, particularly Ansible, for infrastructure and event-driven automation.
- Proven ability to mentor and guide engineers in adopting SRE practices and principles.
- Excellent communication and collaboration skills across diverse teams and vendors.
- Strong judgment and problem-solving capabilities.
- Experience working in multi-cloud environments.
- Strong interpersonal, organizational, communication, and customer service skills.
- Experience applying ITIL, SRE and IT process best practices.
- Experience in tracking major incidents, rollbacks, and hotfixes; leading root cause analysis (RCA) processes; and ensuring resolution and completion of action items.
- Experience with technical engineering in IT operations.
Applicant Notices & Disclaimers
- For information on benefits, equal opportunity employment, and location-specific applicant notices, click here
At SPECTRAFORCE, we are committed to maintaining a workplace that ensures fair compensation and wage transparency in adherence with all applicable state and local laws. This position’s starting pay is: $ 55.00/hr.