About the Role: As a Senior Site Reliability Engineer, you'll work at the intersection of cloud infrastructure and blockchain, building the platform that our product teams deploy to. You'll work closely with product teams to define the tooling and abstractions that let them iterate fast, while keeping everything reliable underneath. The stack spans multiple clouds and Kubernetes clusters, and supports everything from APIs to full Ethereum testnets. You'll also own bringing AI into our engineering workflows, building and deploying autonomous agents and LLM-powered tooling that makes the whole engineering org more productive.

Responsibilities

Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale.
Take a proactive role in the resolution of production issues, ensuring we learn from them in a blameless manner.
Work closely with product teams on crucial initiatives such as production deployments, release management, and incident handling.
Build and deploy AI-powered tooling (autonomous coding agents, LLM-assisted CI/CD, automated incident triage) that makes the engineering org more productive.
Foster a culture of continuous learning and improvement, encouraging constructive review and adaptation processes.

Experience & Qualifications

Kubernetes expertise, with a strong understanding of its core concepts and the ability to manage and maintain clusters.
Expertise within modern cloud native tools, e.g. ArgoCD for GitOps, Terraform/Crossplane for IaC, and the Grafana LGTM stack for observability.
3-5 years of experience in using Infrastructure as Code and tools for cloud provisioning.
3-5 years of practice in development and scripting in languages like Go, Python, or similar.
Expertise when it comes to Linux environments, containerization, and cloud technologies.
A history of 3-5 years in operational roles, overseeing production settings.
AI fluency: Ability to build and deploy LLM-powered developer tooling and autonomous agents.
Advantage: Networking knowledge (service mesh, cross-cloud) and familiarity with the Ethereum ecosystem, staking, and blockchain technologies.

Senior Site Reliability Engineer

Job Description

Key Skills Required

Responsibilities

Experience & Qualifications

How would you rate this job post?

Is this company safe?

Safety First