About Honeycomb: Honeycomb is a pioneering, distributed technology platform redefining the landscape of production systems engineering and setting the highest standard for developer observability tools. We partner with world-renowned digital brands like Slack, HelloFresh, LaunchDarkly, and Vanguard to break down high-cardinality transactional pipelines into clear, actionable performance telemetry. Backed by an accomplished Series D financing milestone and recognized consistently on Forbes’ lists of America’s Best Startups, we foster a distributed-first corporate environment where a fiercely inclusive, highly autonomous cohort of engineering professionals build the future of cloud-scale diagnostics.

Position Overview

We are seeking a highly analytical, systems-fluent, and customer-focused Field Reliability Engineer - LATAM to join our Platform and Managed Services Engineering cell under a permanent, full-time remote configuration based in Brazil. In this highly cross-functional, customer-facing technical seat, you will work at the absolute intersection of site reliability engineering, technical sales backstopping, and open-source cloud telemetry, claiming absolute infrastructure design, container pipeline, and tier-2 production incident triage accountability across Latin American accounts. Shifting completely away from user-facing layout CSS modifications, entry-level helpdesk operations, or simple office file data entry, you will direct an active customer multi-tenant cluster, distributed tracer configuration, and infrastructure-as-code automation laboratory—partnering face-to-face with client SRE leaders, internal Solutions Architects (SAs), and core Product teams. This position requires a systems programming authority who deploys cloud primitives fluidly natively using DevOps and infrastructure-as-code scripts, evaluates network packet delays and memory bottlenecks cleanly across multi-region AWS accounts, stands up production collector pools within complex client environments, and contributes code confidently to upstream OpenTelemetry frameworks to compress data compilation friction.

Key Responsibilities

Managed Services & Cluster Governance: Provision, scale, and maintain customer-facing managed service architectures, including Refinery as a Service (RaaS) and Honeycomb Private Cloud (HnyPC) deployments natively utilizing DevOps pipelines.
Infrastructure as Code (IaC) Engineering: Author, version, and scale centralized modular configuration components using Terraform, Helm, and Kubernetes manifests to govern multi-tenant AWS EKS clusters.
Senior Technical Escalation & Triage: Serve as the definitive tier-2 escalation authority for highly complex production incidents, debugging deep infrastructure issues across AWS PrivateLink, application load balancers (ALBs), multi-cluster VPC configurations, and service meshes.
Technical Sales & POC Lead Ownership: Partner directly with Solutions Architects during late-stage pilot evaluations, joining live client boardrooms to validate architecture designs, instrument collector pools, and provide the infrastructure credibility required to secure technical conversions.
Open Source Ecosystem Sponsorship: Contribute code, reviews, and bug fixes directly into public OpenTelemetry distributions, collectors, and exporters while participating in special interest groups (SIGs).
Telemetry Telemetry Optimization: Formulate custom observability models, SLO workshops, and reference architecture metrics—leveraging Honeycomb own platform to rigorously audit Honeycomb infrastructure dependencies.

Required Skills & Qualifications

Proven professional history operating as a Site Reliability Engineer (SRE), Platform Engineer, Customer-Facing DevOps Architect, or closely related systems development capacity.
Expert Production-Grade Container History: Extensive, hands-on production experience deploying, scaling, and maintaining microservices inside native Kubernetes (EKS) and AWS infrastructures.
Granular technical proficiency writing declarative platform scripts using Terraform (HCL) and Helm packaging tools.
Profound experience diagnosing real-time distributed system anomalies, trace bottlenecks, high-cardinality workloads, and multi-region AWS networking loops under direct commercial time pressure.
Outstanding interpersonal and presentation communication strengths in English, with an established capacity to align smoothly with external corporate enterprise developers.
Location Context: Position operates under remote geographic guidelines open to qualified engineering authorities residing permanently inside Brazil (visa sponsorship or visa transfer support remains unavailable for this vacancy).

Preferred Strategic Indicators (Nice to Have)

Active background as a code committer or maintainer within the public OpenTelemetry open-source repository.
Prior experience setting up rule management interfaces, deployment UIs, or custom dashboard configurations using languages like Go, Python, or Node.js.
Familiarity with logging mechanisms, service level objectives (SLO) instrumentation, or streaming data layers.

What We Offer

Transparent Levels-Based Compensation: A competitive, experience-calibrated base salary structured transparently relative to established technical levels, supplemented by generous equity and employee-friendly stock programs.
100% remote workspace flexibility across Brazil under a deeply distributed-first engineering culture.
Generous Recharging Structures: Access to unlimited Paid Time Off (PTO) allowances alongside up to 16 weeks of fully paid parental leave across all paths to parenthood.
Comprehensive health care insurance features protecting employees, with optional extensions for dependents.
Remote Office Infrastructure Subsidies: Generous allowances covering home office equipment, continuous learning, co-working space parameters, and monthly internet stipends.

Field Reliability Engineer

Job Description

Key Skills Required

Position Overview

Key Responsibilities

Required Skills & Qualifications

Preferred Strategic Indicators (Nice to Have)

What We Offer

How would you rate this job post?

Startup Details

Is this company safe?

Safety First