We are seeking a highly skilled Site Reliability Engineer (SRE) to own the overall health, availability, performance, and resilience of our enterprise platform.
The platform spans SQL Server, .NET, Java, React.js, Microservices, Kafka, and operates in a hybrid cloud environment on Azure and On Premises.
The SRE will lead reliability engineering practices across the stack, manage infrastructure deployment pipelines using Terraform, drive application deployments through GitHub and Azure DevOps, ensure timely remediation of security vulnerabilities, and implement world class observability using Dynatrace and Splunk.
Experience with Kafka (producers, consumers, performance, tuning).
Strong understanding of SRE fundamentals:
SLO/SLI design
Error budgets
Distributed systems concepts
Incident response
Preferred Qualifications
Experience with containerization and Kubernetes (AKS or on prem K8s).
Experience with service mesh, API gateway technologies, or event driven architectures.
Knowledge of secure coding practices and integrating security in CI/CD.
Familiarity with enterprise networking, firewalls, and hybrid connectivity.
Soft Skill
Strong communication and collaboration abilities.
Analytical mindset with strong problem solving skills.
Ability to handle pressure in high severity incidents.
Passion for automation, simplification, and continuous improvement.
Applicant Notices & Disclaimers
For information on benefits, equal opportunity employment, and location-specific applicant notices, click here
At SPECTRAFORCE, we are committed to maintaining a workplace that ensures fair compensation and wage transparency in adherence with all applicable state and local laws. This position’s starting pay is: $ 40.00/hr.