Staff Software Engineer - Inference & Performance
United KingdomJob Description
Key Skills Required
Master these to land this role
Want to know if you're a match for this job?
About Runware: Runware is a premier, hyper-growth artificial intelligence infrastructure pioneer, developer tools innovator, and high-performance compute collective on an absolute mission to build one of the fastest and most reliable AI inference platforms in the market. Operating as a remote-first distributed collective, Runware strips away traditional compute latency bottlenecks, enabling engineering teams around the world to run custom AI systems at incredible scale with zero friction. From request ingress layers through bare-metal GPU execution grids down to final payload results delivery, Runware prioritizes absolute systems-level speed, throughput, and structural durability. The platform fosters an ownership-driven engineering space where rapid release cycles are combined with true operational Downtime, encouraging builders to tackle complex architectural challenges confidently.
Position Overview
We are seeking a highly sophisticated, systems-minded Staff Software Engineer - Inference & Performance to claim absolute technical ownership over latency, throughput, and reliability matrices across our core global AI inference infrastructure. Available as a full-time remote opportunity within the United Kingdom, this senior technical leadership seat demands an expert systems architect who obsesses over software performance engineering at web scale. Moving entirely away from standard boilerplate application wrappers, you will make high-impact architectural decisions to consistently turn ambitious performance targets—such as sub-one-second inference responses—into production reality. This high-agency role is ideal for a veteran developer who thrives at the absolute convergence of low-level systems design, deep hardware orchestration, and real-world infrastructure scale.
Key Responsibilities
- End-to-End Performance Stewardship: Command ultimate architectural accountability for platform-wide latency, concurrency thresholds, throughput capacities, and multi-tenant system reliability metrics.
- Distributed Inference Architecture: Design, optimize, and scale the core processing systems backing the inference grid, including advanced request routing services, async execution layers, message queuing networks, and intelligent GPU scheduling algorithms.
- Sub-Second Latency Optimization: Drive the underlying computing layers toward sub-one-second inference cycles, methodically isolating and resolving software bottlenecks across low-level networking, microservices, distributed storage, and GPU execution threads.
- Python and Systems Development: Program, refactor, and maintain core high-performance server-side services and bare-metal integrations natively utilizing systems languages like Python, Go, Rust, or PHP.
- Production-Ready Model Evaluation: Partner tightly with internal Machine Learning and model training teams to ensure networks are completely production-ready, effectively managing cold starts, dynamic batching, memory saturation, and deep concurrency budgets.
- System Triage and Incident Governance: Lead deep-dive diagnostic investigations into sudden runtime latency spikes, processing degradations, and cascading failures under heavy multi-tenant traffic loads.
- Observability & Profiling Instrumentation: Build, upgrade, and maintain highly granular client-side and server-side observability tools, tracking request states via continuous tracing, metric collectors, and hardware profiling arrays.
- Engineering Bar Multiplication: Influence, guide, and mentor mid-level and senior software engineers across distributed teams, championing pragmatic distributed systems thinking, performance budgeting, and overall operational excellence.
Required Skills & Qualifications
- Excellent, comprehensive professional history running advanced backend systems engineering, low-latency distributed architecture design, low-level systems programming, or infrastructure-scale performance optimization.
- Proven commercial experience building and operating high-concurrency, low-latency distributed networks inside live production environments at global scale.
- Deep, authoritative technical mastery of asynchronous processing loops, distributed queues, advanced concurrency models, and network backpressure strategies.
- Strong, practical intuition evaluating computing trade-offs across CPU boundaries, GPU configurations, high-throughput storage systems, and application deployment layers.
- Hands-on technical proficiency troubleshooting real production defects under heavy stress, using metric diagnostics to resolve memory saturation or latency degradation.
- Outstanding script development capabilities and demonstrable fluency writing code natively inside Python, Go, Rust, or PHP.
- Outstanding verbal and written communication mechanics, with a proven history influencing technical direction across highly distributed engineering and product cells.
- UK Right to Work Constraint: Existing right to work within the United Kingdom without requiring current or future corporate employment visa sponsorship (Runware is unable to offer visa sponsorship at this time).
- Location Context: Full-time remote parameters open exclusively to qualified technical leads base-stationed permanently anywhere within the United Kingdom to execute from home.
Preferred Strategic Indicators (Nice to Have)
- Direct professional experience developing or maintaining specialized AI/ML inference platforms, GPU-backed cloud workloads, or performance-critical compute systems.
- Knowledge of advanced machine learning model optimization techniques, covering dynamic batching configurations, model quantization, warm-start parameters, and VRAM memory management.
- Hands-on familiarity with Infrastructure-as-Code (IaC) architectures and modern automated DevOps deployment processes.
- Prior individual ownership or stewardship over complex throughput SLOs and latency SLAs at multi-million transaction scales.
What We Offer
- The exceptional professional canvas to directly direct, code-shape, and deploy the low-latency infrastructure architectures powering the next generation of real-time global artificial intelligence.
- Highly attractive and competitive baseline compensation scales supplemented by meaningful corporate stock option grants, allowing you to share directly in the upside you create.
- Profound work-from-home remote parameters offering superior lifestyle flexibility, complete calendar autonomy outside core collaboration hours, and zero office commuting friction.
- Generous paid time off allocations, encompassing flexible vacation periods, sick days, and local UK public holidays.
- Paid family leave frameworks, incorporating dedicated maternity, paternity, and caregiver time-off options to support your home life.
- Access to elite, twice-yearly physical company retreats in inspiring locations worldwide to brain-storm, plan strategy, celebrate wins, and connect face-to-face.
- A balanced company release model featuring intense, fast-paced pushes followed immediately by real, expected downtime blocks to unplug, recharge, and return stronger.
How would you rate this job post?
See what other professionals think about this role.
Is this company safe?
Ask Hyrizon AI to scan this company for potential red flags before you apply.
Safety First
- Never pay for a job application.
- Do not share sensitive bank info.
- Verify the client before starting work.