About GenPeach AI

GenPeach AI is a product-driven research lab building vertical multimodal foundation models for hyper-realistic human generation in image and video – designed for emotionally resonant, human-centered AI experiences. Our goal is to create tools that supercharge human creativity rather than replace it.

We train models from scratch: proprietary datasets at massive scale, novel architectures and training recipes, large GPU clusters, and tight product integration so research ships to users quickly.

We are a deeply technical team of around 10 people. We’re advised by Directors from Google DeepMind and backed by leading AI-focused funds and angels from OpenAI, Meta AI, Microsoft AI, Project Prometheus, and Fal. Collectively, our team, advisors, and angels have contributed to models including Meta’s Imagine/MovieGen and foundation-model work behind OpenAI’s Sora, plus Google’s Veo and Gemini.

About the Team

You’ll join the research team working across image/video generation and multimodal understanding. You’ll work closely with other Research Engineers and Scientists, as well as Founders and help turn research into scalable training runs, strong evaluations, and production-ready systems.

About the Role

We’re hiring an AI Research Engineer to help build and scale GenPeach’s foundation models end-to-end – from implementing new model ideas and training recipes, to owning the parts of the training stack that determine quality and speed, to pushing models through production constraints.

This is a hands-on, high-ownership role. You’ll write research-grade code that becomes production-critical.

Responsibilities

Implement and iterate on image/video generative model ideas (architecture, losses, conditioning, sampling, pre-training, distillation, post-training)
Own training performance end-to-end (distributed training, throughput, memory, stability, debugging scaling failure modes)
Build the experimentation loop (evals, ablations, reproducibility tooling, reporting, decision hygiene)
Build and improve VLMs for image/video captioning (data recipes, training strategies, model variants, evaluation)
Run high-iteration research: read papers when useful, implement ideas, validate empirically
Create captioning pipelines that improve generation training and product quality
Partner with inference/product to ship under real constraints (latency, cost, reliability, rollout safety)
Build demos and prototypes to showcase capabilities and accelerate iteration

Requirements

Strong Python and PyTorch skills (4+ years of experience)
Experience implementing and training deep learning models (generative models, VLMs, LLMs, vision/video, or adjacent)
Solid understanding of training dynamics, optimization, and practical debugging
Ability to drive projects end-to-end with minimal supervision

Preferred Qualifications

Hands-on experience with diffusion/flow-based image or video generation, or large-scale generative modeling in adjacent domains
Experience with distributed training at scale (multi-node) and performance tuning (throughput/memory)
Experience building evaluation frameworks (offline metrics + human eval + regression tracking)
Strong intuition for data quality and dataset/labeling tradeoffs for training and captioning
Publications are a plus, but shipped impact and strong technical evidence matter more

What makes this role unique

Build frontier image/video models and the VLM captioning systems that power them
Join a lean, senior team that holds a high engineering + research bar
Direct product impact: your training runs become real user-facing capabilities
Benchmark against the best in the world and compete on model quality through what we ship

How we work

You own outcomes end-to-end and are trusted with real responsibility
Direct, low-ego communication and fast feedback loops
Bias toward impact: measure → iterate → ship
Research discipline: clear ablations, reproducibility, and crisp decision-making

Logistics

Location: Zurich (Switzerland) or Warsaw (Poland) — onsite or hybrid. If you’re elsewhere, we’re open to remote (team/timezone fit considered).
Compensation: competitive salary + meaningful equity (level-dependent)
Interview process: quick screen → 2x technical rounds (practical + systems) → team fit/values

What we offer

Visa sponsorship (where applicable); we’ll make a strong effort to relocate you to Switzerland or Poland if desired
Remote-friendly: work fully remote, hybrid, or on-site from our hubs
Regular offsites and in-person events to collaborate and connect
Flexible PTO

Member of Technical Staff – AI Research Engineer (Image/Video Foundation Models)

Job Description

Key Skills Required