Senior Site Reliability Engineer
PolandJob Description
Key Skills Required
Master these to land this role
Want to know if you're a match for this job?
About Cribl: Cribl is building the telemetry infrastructure for the AI era, partnering with IT and Security teams at many of the world’s biggest enterprises—including half of the Fortune 100. As the AI Platform for Telemetry, we give customers the choice, control, and flexibility to manage and analyze massive real-time datasets across both human and automated agent architectures.
Why You'll Love This Role
As a Senior Site Reliability Engineer at Cribl, you will join our expanding engineering team in Poland. Our SREs are involved from early conception to architectural design, through deployment and into production. You will bring your creative input into scaling frameworks, high availability solutions, and infrastructure resiliency to ensure our customers remain in full control of their observability pipelines.
Key Responsibilities
- Service Reliability Lifecycle: Engage with product squads from inception to production to build resilience, high availability, and optimal service delivery tracks.
- Production Monitoring: Measure, track, and monitor distributed cloud platforms with a tight focus on availability, low latency, and overall infrastructure health.
- Root Cause Resolution: Isolate the origin vectors of runtime errors and platform instability within cloud systems, driving squads toward blameless operational excellence.
- Toil Automation: Actively eliminate repetitive operational overheads through creative script automation and tool innovation.
- On-Call Execution: Participate in scheduled standby, emergency on-call shifts, and off-hours remediation protocols as needed.
Required Skills & Qualifications
- Proven track record designing, implementing, and maintaining high-volume observability and telemetry systems.
- Deep expertise deploying Infrastructure as Code (IaC) architectures using Terraform or Ansible alongside Cloud SDK libraries.
- Strong experience navigating hyperscaler cloud environments (specifically AWS and Azure) along with container orchestration layers (Docker/Kubernetes).
- Familiarity with enterprise monitoring platforms and logging suites (e.g., Splunk, New Relic, CloudWatch, Prometheus, Grafana, Kibana, Sentry).
- Hands-on programming capacity with JavaScript, Node.js, and TypeScript inside Linux/Mac environments.
- Solid background in Linux Systems Engineering and sustainable incident response patterns utilizing tools like PagerDuty, FireHydrant, or Blameless.
- High comfort level operating with deep autonomy within a globally distributed engineering environment.
- Strong understanding of application security baselines, resilient data management patterns, and SLO/SLI tracking.
What We Offer
- Opportunity to engineer systems at the absolute forefront of the AI data and telemetry industry.
- Highly collaborative, remote-first environment ("software is a people business") with a passionate, distributed team.
- 100% remote working flexibility based entirely within Poland.
How would you rate this job post?
See what other professionals think about this role.
Is this company safe?
Ask Hyrizon AI to scan this company for potential red flags before you apply.
Safety First
- Never pay for a job application.
- Do not share sensitive bank info.
- Verify the client before starting work.