About Cribl: Cribl is building the telemetry infrastructure for the AI era, partnering with IT and Security teams at many of the world’s biggest enterprises—including half of the Fortune 100. As the AI Platform for Telemetry, we give customers the choice, control, and flexibility to manage and analyze massive real-time datasets across both human and automated agent architectures.

Why You'll Love This Role

As a Senior Site Reliability Engineer at Cribl, you will join our expanding engineering team in Poland. Our SREs are involved from early conception to architectural design, through deployment and into production. You will bring your creative input into scaling frameworks, high availability solutions, and infrastructure resiliency to ensure our customers remain in full control of their observability pipelines.

Key Responsibilities

Service Reliability Lifecycle: Engage with product squads from inception to production to build resilience, high availability, and optimal service delivery tracks.
Production Monitoring: Measure, track, and monitor distributed cloud platforms with a tight focus on availability, low latency, and overall infrastructure health.
Root Cause Resolution: Isolate the origin vectors of runtime errors and platform instability within cloud systems, driving squads toward blameless operational excellence.
Toil Automation: Actively eliminate repetitive operational overheads through creative script automation and tool innovation.
On-Call Execution: Participate in scheduled standby, emergency on-call shifts, and off-hours remediation protocols as needed.

Required Skills & Qualifications

Proven track record designing, implementing, and maintaining high-volume observability and telemetry systems.
Deep expertise deploying Infrastructure as Code (IaC) architectures using Terraform or Ansible alongside Cloud SDK libraries.
Strong experience navigating hyperscaler cloud environments (specifically AWS and Azure) along with container orchestration layers (Docker/Kubernetes).
Familiarity with enterprise monitoring platforms and logging suites (e.g., Splunk, New Relic, CloudWatch, Prometheus, Grafana, Kibana, Sentry).
Hands-on programming capacity with JavaScript, Node.js, and TypeScript inside Linux/Mac environments.
Solid background in Linux Systems Engineering and sustainable incident response patterns utilizing tools like PagerDuty, FireHydrant, or Blameless.
High comfort level operating with deep autonomy within a globally distributed engineering environment.
Strong understanding of application security baselines, resilient data management patterns, and SLO/SLI tracking.

What We Offer

Opportunity to engineer systems at the absolute forefront of the AI data and telemetry industry.
Highly collaborative, remote-first environment ("software is a people business") with a passionate, distributed team.
100% remote working flexibility based entirely within Poland.

Senior Site Reliability Engineer

Job Description

Key Skills Required

Why You'll Love This Role

Key Responsibilities

Required Skills & Qualifications

What We Offer

How would you rate this job post?

Safety First