Level: Experienced

Job Field: IT, DevOps

Employment Type: Full Time

Contract Type: Permanent employment

Location: Berlin

Working Model: Hybrid, Onsite

Job Summary

In this role, you will collaborate closely with development teams, ensuring the stability, security, and performance of complex systems while maintaining monitoring solutions and automating infrastructure.

Job Technologies

Your role in the team

We are looking for a highly qualified and experienced Site Reliability Engineer to support our team in a 24/7 shift.
The SRE Department L2 operates all IONOS Cloud IaaS and PaaS services.
As a Site Reliability Engineer, you are responsible for the stability, security, and performance of our complex, distributed systems.
You work closely with the development teams to design, implement, and operate scalable and reliable infrastructures, as well as automate and optimize processes.
Technical Level-2 Support with direct customer contact.
Maintenance of monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, Loki) for proactive problem detection during shift operations and participation in resolving complex issues in distributed systems.
Troubleshooting in networks (LAN/WAN/VPN, DNS, DHCP) and storage systems (File/Object/Block); provisioning and operation of highly available services on Linux and Kubernetes (Helm charts).
Development and maintenance of Infrastructure-as-Code, automation, and playbooks with Ansible, Terraform, GitLab CI/CD, Argo CD, as well as scripting languages such as Bash, Python, and Go.
Collaboration with development teams to improve processes and deployments as well as to ensure the seamless integration of new services and applications into our cloud and Kubernetes environment.
Ensuring a stable and secure platform operation, including end-to-end incident management from initial analysis through resolution to post-incident review within the scope of problem management.

This text has been machine translated. Show original

Our expectations of you

Qualifications

Willingness to work in a 24 × 7 shift model (night, weekend, and holiday shifts) while demonstrating a strong problem-solving and troubleshooting mindset.
In-depth knowledge of automation tools (e.g., Ansible, SaltStack), monitoring and observability tools (Prometheus, Grafana, Loki), as well as logging and alerting solutions (ELK Stack).
Very good knowledge of at least one programming or scripting language (Go, Python, Bash) for automation and monitoring tasks.
Deep knowledge of Linux MD RAID (mdadm, sedadm) and LVM.
Expertise in Linux performance tuning and network stack debugging (ethtool, perf, tcpdump, ibstat, ibtop).
Practical experience with S3, Ceph, and software-defined networks.
Fluent in German and English (at least B2 according to the CEFR standard).

Experience

Several years of experience as a Site Reliability Engineer or in a related role (Linux System Administrator, Platform Engineer, DevOps/Infrastructure Engineer, Full-Stack Developer).
Experience with virtualized environments (QEMU/KVM, OpenStack, Proxmox), cloud storage technologies (File, Object, Block), and secure handling of Docker & Kubernetes.
Experience in code management (merge conflicts, feature branches, merge requests, CI/CD) is advantageous.
Experience with RDMA, InfiniBand, and RoCE protocols.
Experience with established software development practices (code reviews, build processes, packaging, testing).

This text has been machine translated. Show original

What we offer

At the end of the application process, candidates must undergo a security check.
Hybrid working model.
Shift model working hours.
At some locations, a subsidized canteen and various free beverages.
Modern office spaces with excellent transport links.
Various employee discounts for activities and products.
Employee events such as summer and winter parties, as well as workshops.
Numerous opportunities for further training and development.
Various health offerings, such as sports and health courses.

This text has been machine translated. Show original

Benefits

Work-Life-Integration

Health, Fitness & Fun

Food & Drink

🍏Fresh Fruit

Topics that you deal with on the job

Job Locations

Location Berlin

Germany
Location Berlin

Germany

This is your employer

United Internet AG

As the leading provider of communication solutions in Germany, we offer our users a safe and reliable way to communicate with our strong brands: 1&1, GMX, WEB.DE, and mail.com—despite handling 500 million incoming emails every day!

Company Type: Established Company

Working Model: Full Remote, Hybrid, Onsite

Industry: Internet, IT, Telecommunication

Dev Reviews

by devworkplaces.com

Total

(1 Review)

3.4

Engineering
3.3
Workingconditions
3.8
Career Growth
3.2
Culture
3.2

Show All Dev Reviews

Site Reliability Engineer

United Internet AG

Location: Berlin
Working Model: Hybrid, Onsite
Diversity: Open for all genders

Site Reliability Engineer

Job Summary

Job Technologies

Your role in the team

Our expectations of you

Qualifications

Experience

What we offer

Benefits

Work-Life-Integration

Health, Fitness & Fun

Food & Drink

Topics that you deal with on the job

Job Locations

Location Berlin

Location Berlin

This is your employer

United Internet AG

Dev Reviews

Total

Engineering

Workingconditions

Career Growth

Culture

More Jobs

Full Stack Engineer

IT Project Manager International Franchise

Senior MLOps Engineer

Full Stack Engineer

Oracle Database Admin

Customer Data Consultant

Career Tips

For Employer

Company

Partners and Portals

Site Reliability Engineer

Job

Job Summary

Job Technologies

Your role in the team

Our expectations of you

Qualifications

Experience

What we offer

Benefits

Work-Life-Integration

Health, Fitness & Fun

Food & Drink

Topics that you deal with on the job

Job Locations

Location Berlin

Location Berlin

This is your employer

United Internet AG

Description

Dev Reviews

Total

Engineering

Workingconditions

Career Growth

Culture

More Jobs

Full Stack Engineer

IT Project Manager International Franchise

Senior MLOps Engineer