Logo Agile Robots Ag

ML Platform Engineer

Job

  • Level
    Experienced
  • Job Field
    Software, Data
  • Employment Type
    Full Time
  • Contract Type
    Permanent employment
  • Location
    Munich
  • Working Model
    Onsite
  • Job Summary

    In this role, you build the infrastructure for distributed training, deployment, and experimentation, utilizing technologies like Kubernetes and PyTorch to efficiently transition ML models into production.

    Job Technologies

    Your role in the team

    • The AI Research Division of Agile Robots is looking for an ML Platform Engineer, who will build and operate the distributed training, deployment, and experimentation infrastructure that research, data, and robotics teams depend on to move models from prototype to production.
    • Design and scale distributed training workflows for large models using tools such as PyTorch Distributed, DeepSpeed, and cluster schedulers like SLURM or Kubernetes.
    • Build and maintain containerized ML environments that support reproducible experimentation and benchmarking.
    • Develop and maintain CI/CD pipelines for machine learning systems to enable reliable testing, training, and deployment of models.
    • Implement experiment tracking, model versioning, and reproducibility workflows using tools such as ClearML or Weights & Biases.
    • Set up monitoring systems such as Prometheus and Grafana to track model performance and system health and detect drift in production.
    • Work with research, data, and robotics teams to connect new models to robust production systems.

    This text has been machine translated. Show original

    Our expectations of you

    Education

    • Degree in Computer Science, Software Engineering, or a related field, with professional experience building and operating ML or software infrastructure in production.

    Qualifications

    • Vertrautheit mit Infrastructure-as-Code-Tools wie Terraform.
    • Exposure to high-performance or distributed compute environments.

    Experience

    • Experience designing and operating distributed training systems on Kubernetes and Docker, using PyTorch Distributed, DeepSpeed, and schedulers such as SLURM.
    • Experience building CI/CD pipelines that support reliable model testing, training, and deployment.
    • Experience operating ML workloads on cloud infrastructure, preferably AWS.
    • Hands-on experience with experiment tracking and model versioning using tools such as MLflow or Weights & Biases.
    • Experience with monitoring and drift detection using tools such as Prometheus and Grafana.
    • Python and system design skills, with experience building and operating ML systems beyond the prototype stage.
    • Experience with large-scale or multimodal ML systems such as vision-language-action models.
    • Experience with ML pipeline and orchestration tools.

    This text has been machine translated. Show original

    What we offer

    • Dynamisches High-Tech-Unternehmen, verbunden mit finanzieller Solidität und Investoren von Weltklasse.
    • Join an interdisciplinary, international team with 60+ different nationalities in a collaborative work environment.
    • Lots of development opportunities in the context of our continued growth.
    • Challenging tasks and impactful projects alongside experts that enable professional and personal growth.
    • Corporate Benefits Program that covers health, mobility and learning with 100 € net per month.
    • Modern office facilities with a rooftop terrace overlooking Munich, free drinks & fruits, and regular company events contribute to a good working environment.

    This text has been machine translated. Show original

    Benefits

    Health, Fitness & Fun

    Work-Life-Integration

    Topics that you deal with on the job

    Job Locations

    • Location Munich

      Bayern

      Germany

    This is your employer

    Agile Robots Ag

    Agile Robots Ag

    Agile Robots SE, founded by leading robotics researchers, focuses on the development of AI-controlled robots and has established itself as a pioneer in automation.

    Description

  • Company Type
    Established Company
  • Working Model
    Onsite
  • Industry
    Electronics, Automatization
  • Logo Agile Robots Ag

    ML Platform Engineer

    Location
    Munich
    Working Model
    Onsite
    Diversity
    Open for all genders
    English Only
    English only required

    More Jobs