1. Home
  2. Jobs
  3. DE
  4. Berlin
  5. Berlin
  6. Mitte
  7. AI Research
  8. Research Scientist - Model Team
MA

Research Scientist - Model Team

Mirelo AI
Posted onFeb 17, 2026
LocationBerlin, Berlin, Germany | Tübingen, Baden-Württemberg, Germany (Hybrid)
Employment typeFull-time

Mirelo AI is building the next generation of creative tools by generating realistic sound, speech and music from video.

We develop cutting-edge foundational generative AI models that "unmute" silent video content and create custom, hyper-realistic audio for gaming, video platforms, and creators. Our technology empowers global storytellers to transform their content.

We recently closed a $41 million Seed round co-led by Andreessen Horowitz and Index Ventureswith participation from Atlantic, and are rapidly expanding across Product, Engineering, Go-to-Market, and Growth.


About the Role

At Mirelo, you’ll work at the centre of how we build the next generation of multimodal video-to-audio models. This role is deeply hands-on and research-heavy: with a great H100/200-per-engineer ratio you explore and build new multimodal models and push the boundaries of what’s possible in music, sound, and speech generation. You’ll collaborate closely across research and engineering, run focused ablations, and translate experimental results into clear next steps for the team. From data curation to deployment, you’ll help shape the full lifecycle of the models that power our products and partnerships.

Key Responsibilities

  • Design, implement and train large-scale multimodal generative models for audio generation (diffusion and/or autoregressive models).

  • Explore new modeling ideas for audio generation (music, sound, speech) while taking inspiration from the language and image domains.

  • Develop and experiment with post-training for new capabilities (fine-grained control, in/out-painting, editing, …)

  • Conduct rigorous ablation studies, get actionable insights and communicate results to the team to discuss new research directions.

  • Contribute hands-on to all stages of model development including data curation, experimentation, evaluation, and deployment.

Ideal Candidate Profile

  • Hands-on experience in training large-scale generative models in a fast-paced research environment.

  • Deep understanding of cutting-edge methods and ML research in at least one of the domains: image, language, video or audio (specific audio experience not necessary, but nice to have).

  • Strong proficiency in PyTorch, transformer architectures, and the full ecosystem of modern deep learning.

  • Solid understanding of distributed training techniques—FSDP, low precision training, model parallelism

  • Strong track-record in working on generative models (publications in top-tier venues, open-source contributions or applied ML projects).

Nice to Have

  • Proficiency with profiling, debugging, and optimizing single and multi-GPU operations using tools like Nsight or stack trace viewers.

  • Strong software engineering skills/experience in collaborating on large codebases that go beyond PhD research code.

  • Experience with generative models for audio (sound, music or speech) and audio codec design.

Why Join?

  • Join at a pivotal moment. We've secured fresh funding and are gaining traction - now is when your contributions can make a real difference to our success.

  • True ownership from day one. You'll have genuine autonomy and responsibility. Your ideas and work will directly shape our product and company direction.

  • Competitive compensation and equity. We offer strong packages that ensure you share in the success you help create.

  • Build for the next generation of creators. Be part of the innovation that will transform how creators work and thrive.

We welcome applications from all individuals, regardless of ethnic origin, gender, disability, religion or belief, age, or sexual orientation and identity.

Mirelo AI is a Berlin-based startup building AI foundation models that generate synchronized sound effects and music for videos in seconds, addressing the gap in audio technology for generative AI.

Similar jobs

You might also be interested in...

CA2w

Researcher: Model Architecture, UK

Cartesia

London, England, United Kingdom (On-site)

MA6d

Applied Scientist / Research Engineer - EMEA

Mistral AI

Île de Ré, Charente-Maritime, France (Hybrid)

MA6d

Research Engineer, Machine Learning

Mistral AI

Paris, Paris, France (Hybrid)

CE6d

Applied AI/ML Scientist

Cerebras

United Arab Emirates (On-site)

SC2w

Machine Learning Fellow - Human Frontier Collective (US)

Scale

United States (Remote)