Back to Jobs
FeatherlessAI & Machine Learning 14d ago
Machine Learning Engineer — Inference Optimization
Remote (World)
Full-time
Not Mentioned
Job Description
This company is looking for an engineer to push the limits of model inference performance at scale. You will work at the intersection of Research and Production. Your goal is to take cutting-edge models and turn them into fast, reliable, and cost-efficient systems that serve real users. This role is for those who enjoy deep technical work, profiling systems down to the kernel/GPU level.
Key Responsibilities
- Performance Tuning: Optimize inference latency, throughput, and cost for large-scale ML models in production.
- Deep Profiling: Profile and identify bottlenecks in GPU/CPU pipelines (memory, kernels, batching, IO).
- Advanced Techniques: Implement optimizations like Quantization (fp16, int8, fp8), KV-cache reuse, and Speculative Decoding.
- System Building: Build and maintain inference-serving systems using tools like Triton or custom runtimes.
- Hardware Benchmarking: Benchmark performance across different hardware (NVIDIA vs. AMD) and cloud setups.
Requirements
- Core Experience: Strong experience in ML inference optimization or high-performance ML systems.
- Internals Knowledge: Solid understanding of deep learning internals (Attention mechanisms, memory layout, compute graphs).
- Tech Stack: Hands-on experience with PyTorch and familiarity with GPU tuning (CUDA, ROCm, Triton).
- Scale: Experience scaling inference for real users, not just theoretical research benchmarks.
Nice to Have
- Experience with LLM or long-context model inference.
- Knowledge of frameworks like TensorRT, ONNX Runtime, or vLLM.
- Background in distributed systems or low-latency services.
Benefits
- Equity: Meaningful equity at Series A stage.
- Impact: Direct impact on unit economics (saving the company money on compute is huge).
- Remote: Work from anywhere in the world.
Is this company safe?
Ask Hyrizon AI to scan this company for potential red flags.
Safety First
- Never pay for a job application.
- Do not share sensitive bank info.
- Verify the client before starting work.