Staff Software Engineer, Infrastructure
Job Description
About Gusto
At Gusto, we're on a mission to grow the small business economy. We handle the hard stuff—like payroll, health insurance, 401(k)s, and HR—so owners can focus on their craft and customers. With teams in Denver, San Francisco, and New York, we’re proud to support more than 400,000 small businesses across the country, and we’re building a workplace that represents and celebrates the customers we serve. Learn more about our Total Rewards philosophy.
We are looking for a Staff-level Site Reliability Engineer to build and improve our compute (Kubernetes) and networking (service mesh) platforms. This role is for someone who loves building foundational systems that empower engineers to move faster, safer, and smarter. You will execute on the technical strategy for how services run and communicate within your area, ensuring reliability, scalability, and developer experience.
What You’ll Do
- Execute on the technical strategy for Gusto's compute and networking platforms by building and improving Kubernetes, service mesh, and related systems that power our SaaS platform within your team's scope
- Lead architecture design, performance optimization, and security hardening of distributed systems at scale
- Partner with your team and cross-functional stakeholders to translate organizational goals into concrete, measurable technical outcomes for your area
- Treat infrastructure as a product by focusing on the developer experience, simplifying workflows, and accelerating delivery
- Mentor engineers on your team and within the Infrastructure organization by coaching, pairing, and modeling best practices
- Communicate with clarity by simplifying complex concepts and building alignment across teams and functions
- Lead incident response and ensure our systems meet the highest standards of availability and resiliency
What You’ll Bring
- 12 or more years of experience in Infrastructure, Platform, or Site Reliability Engineering roles operating large scale distributed systems
- Proven hands-on coding experience in production environments and the ability to contribute directly to system development and automation
- Deep expertise running, scaling, and securing Kubernetes and service mesh technologies such as Istio, Envoy, or Cilium in production
- A demonstrated record of designing highly available systems that balance reliability, observability, and developer velocity
- Mastery of Infrastructure as Code using tools such as Terraform or Crossplane, with experience managing complex state for large engineering organizations
- Experience owning and evolving twenty four seven SaaS infrastructure including monitoring, alerting, and performance analysis
- A systems thinking mindset that identifies small levers capable of creating meaningful organizational impact
- Experience applying AI tools to automate infrastructure operations, improve observability, and accelerate incident response. Familiar with AI-assisted development workflows and willing to mentor others on effective AI usage in SRE contexts.
- Experience integrating AI copilots or automation frameworks into developer workflows to accelerate delivery and reduce operational burden
- Exceptional communication skills that simplify complex ideas and bring teams together around a shared vision
- A strong bias toward action, resilience in the face of ambiguity, and the ability to transform big ideas into real, impactful systems
Is this company safe?
Ask Hyrizon AI to scan this company for potential red flags.
Safety First
- Never pay for a job application.
- Do not share sensitive bank info.
- Verify the client before starting work.