Logo United Internet AG

Site Reliability Engineer

Job

  • Level
    Experienced
  • Job Field
    IT, DevOps
  • Employment Type
    Full Time
  • Contract Type
    Permanent employment
  • Location
    Berlin
  • Working Model
    Hybrid, Onsite
  • Job Summary

    In this role, you will collaborate closely with development teams, ensuring the stability, security, and performance of complex systems while maintaining monitoring solutions and automating infrastructure.

    Job Technologies

    Your role in the team

    • We are looking for a highly qualified and experienced Site Reliability Engineer to support our team in a 24/7 shift.
    • The SRE Department L2 operates all IONOS Cloud IaaS and PaaS services.
    • As a Site Reliability Engineer, you are responsible for the stability, security, and performance of our complex, distributed systems.
    • You work closely with the development teams to design, implement, and operate scalable and reliable infrastructures, as well as automate and optimize processes.
    • Technical Level-2 Support with direct customer contact.
    • Maintenance of monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, Loki) for proactive problem detection during shift operations and participation in resolving complex issues in distributed systems.
    • Troubleshooting in networks (LAN/WAN/VPN, DNS, DHCP) and storage systems (File/Object/Block); provisioning and operation of highly available services on Linux and Kubernetes (Helm charts).
    • Development and maintenance of Infrastructure-as-Code, automation, and playbooks with Ansible, Terraform, GitLab CI/CD, Argo CD, as well as scripting languages such as Bash, Python, and Go.
    • Collaboration with development teams to improve processes and deployments as well as to ensure the seamless integration of new services and applications into our cloud and Kubernetes environment.
    • Ensuring a stable and secure platform operation, including end-to-end incident management from initial analysis through resolution to post-incident review within the scope of problem management.

    This text has been machine translated. Show original

    Our expectations of you

    Qualifications

    • Willingness to work in a 24 ร— 7 shift model (night, weekend, and holiday shifts) while demonstrating a strong problem-solving and troubleshooting mindset.
    • In-depth knowledge of automation tools (e.g., Ansible, SaltStack), monitoring and observability tools (Prometheus, Grafana, Loki), as well as logging and alerting solutions (ELK Stack).
    • Very good knowledge of at least one programming or scripting language (Go, Python, Bash) for automation and monitoring tasks.
    • Deep knowledge of Linux MD RAID (mdadm, sedadm) and LVM.
    • Expertise in Linux performance tuning and network stack debugging (ethtool, perf, tcpdump, ibstat, ibtop).
    • Practical experience with S3, Ceph, and software-defined networks.
    • Fluent in German and English (at least B2 according to the CEFR standard).

    Experience

    • Several years of experience as a Site Reliability Engineer or in a related role (Linux System Administrator, Platform Engineer, DevOps/Infrastructure Engineer, Full-Stack Developer).
    • Experience with virtualized environments (QEMU/KVM, OpenStack, Proxmox), cloud storage technologies (File, Object, Block), and secure handling of Docker & Kubernetes.
    • Experience in code management (merge conflicts, feature branches, merge requests, CI/CD) is advantageous.
    • Experience with RDMA, InfiniBand, and RoCE protocols.
    • Experience with established software development practices (code reviews, build processes, packaging, testing).

    This text has been machine translated. Show original

    What we offer

    • At the end of the application process, candidates must undergo a security check.
    • Hybrid working model.
    • Shift model working hours.
    • At some locations, a subsidized canteen and various free beverages.
    • Modern office spaces with excellent transport links.
    • Various employee discounts for activities and products.
    • Employee events such as summer and winter parties, as well as workshops.
    • Numerous opportunities for further training and development.
    • Various health offerings, such as sports and health courses.

    This text has been machine translated. Show original

    Benefits

    Work-Life-Integration

    Health, Fitness & Fun

    Food & Drink

    Topics that you deal with on the job

    Job Locations

    • Location Berlin

      Germany

    This is your employer

    United Internet AG

    United Internet AG

    As the leading provider of communication solutions in Germany, we offer our users a safe and reliable way to communicate with our strong brands: 1&1, GMX, WEB.DE, and mail.comโ€”despite handling 500 million incoming emails every day!

    Description

  • Company Type
    Established Company
  • Working Model
    Full Remote, Hybrid, Onsite
  • Industry
    Internet, IT, Telecommunication
  • Dev Reviews

    by devworkplaces.com

    Total

    (1 Review)
    3.4
    • Engineering

      3.3
    • Workingconditions

      3.8
    • Career Growth

      3.2
    • Culture

      3.2
    Show All Dev Reviews
    Logo United Internet AG

    Site Reliability Engineer

    Location
    Berlin
    Working Model
    Hybrid, Onsite
    Diversity
    Open for all genders

    More Jobs