Job
- Level
- Senior
- Job Field
- Data
- Employment Type
- Full Time
- Contract Type
- Permanent employment
- Location
- Heidelberg
- Working Model
- Hybrid, Onsite
Job Summary
In this role, you will develop innovative methods in reinforcement learning, conduct large-scale experiments, and optimize training infrastructures to enhance the performance of our models.
Job Technologies
Your role in the team
- Aleph Alpha is one of the few companies in Europe with end-to-end in-house model development including pre- and post-training. We're building models that have general-purpose capabilities, but also specifically excel at addressing the needs of our customers.
- We're growing our post-training team in Heidelberg (or hybrid in Germany) and are looking for an AI Researcher who combines a deep theoretical understanding of reinforcement learning methods with a desire to improve on the state of the art and improve model capabilities in large-scale training.
- As a (senior) AI Researcher for reinforcement learning, you will shape and improve the underlying RL methodology, maintain a high-quality training code-base, and conduct large-scale experiments to hill-climb our performance benchmarks.
- In your day-to-day you will conduct large-scale reinforcement learning experiments, derive hypotheses from the results, and iterate on both the implementation and methodology based on the observations.
- Together with a collaborative team, you will have direct impact on the models that we ship to our customers.
- Hill-climb in large-scale training: Conduct large-scale LLM training runs, analyze evaluation scores in depth, propose hypotheses for improvement and directly implement them in order to maximize performance on our benchmarks.
- Theoretical innovation: Stay at the bleeding edge of RL research. You will identify, implement, and iterate on novel approaches to multi-turn reinforcement learning.
- Scale our training infrastructure: Identify bottlenecks in our training setup and optimize our RL training loops for large-scale training.
- Cross-functional collaboration: Partner with our other post-training teams to turn raw feedback into actionable training signals, ensuring that our RL iterations lead to measurable improvements in downstream performance.
This text has been machine translated. Show original
Our expectations of you
Qualifications
- A deep understanding of Reinforcement Learning theory and how it relates to modern RL methods.
- Vertrautheit mit statistischen Methoden zur Bewertung und zum Versuchsdesign.
- Ability to reason about what an evaluation/environment measures and whether it matters - not just run benchmarks, but understand them.
- Starke Python-Kenntnisse und Vertrautheit mit ML-Tools (insbesondere torch distributed).
- Willingness to relocate to Heidelberg or travel regularly (potentially weekly).
- A history of contributions to top-tier venues (NeurIPS, ICML, ICLR, etc.) specifically regarding RL.
Experience
- Experience with multi-node LLM training (ideally using RL). You understand how to scale multi-node RL trainings and can reason about and implement distributed algorithms.
- PhD in reinforcement learning or equivalent research experience.
- Experience evaluating LLM models and crafting environments for training.
This text has been machine translated. Show original
Benefits
Health, Fitness & Fun
Topics that you deal with on the job
Job Locations
This is your employer
Aleph Alpha
As a German AI startup based in Heidelberg, Aleph Alpha focuses on the development of large language models and generative AI. It offers solutions for companies looking to build their own AI capabilities and protect their data.
Description
- Company Type
- Startup
- Working Model
- Hybrid, Onsite
- Industry
- Internet, IT, Telecommunication