Job
- Level
- Experienced
- Job Field
- IT, DevOps
- Employment Type
- Full Time
- Contract Type
- Permanent employment
- Location
- Berlin
- Working Model
- Hybrid, Onsite
Job Summary
In this role, you will collaborate closely with development teams, ensuring the stability, security, and performance of complex systems while maintaining monitoring solutions and automating infrastructure.
Job Technologies
Your role in the team
- We are looking for a highly qualified and experienced Site Reliability Engineer to support our team in a 24/7 shift.
- The SRE Department L2 operates all IONOS Cloud IaaS and PaaS services.
- As a Site Reliability Engineer, you are responsible for the stability, security, and performance of our complex, distributed systems.
- You work closely with the development teams to design, implement, and operate scalable and reliable infrastructures, as well as automate and optimize processes.
- Technical Level-2 Support with direct customer contact.
- Maintenance of monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, Loki) for proactive problem detection during shift operations and participation in resolving complex issues in distributed systems.
- Troubleshooting in networks (LAN/WAN/VPN, DNS, DHCP) and storage systems (File/Object/Block); provisioning and operation of highly available services on Linux and Kubernetes (Helm charts).
- Development and maintenance of Infrastructure-as-Code, automation, and playbooks with Ansible, Terraform, GitLab CI/CD, Argo CD, as well as scripting languages such as Bash, Python, and Go.
- Collaboration with development teams to improve processes and deployments as well as to ensure the seamless integration of new services and applications into our cloud and Kubernetes environment.
- Ensuring a stable and secure platform operation, including end-to-end incident management from initial analysis through resolution to post-incident review within the scope of problem management.
This text has been machine translated. Show original
Our expectations of you
Qualifications
- Willingness to work in a 24 ร 7 shift model (night, weekend, and holiday shifts) while demonstrating a strong problem-solving and troubleshooting mindset.
- In-depth knowledge of automation tools (e.g., Ansible, SaltStack), monitoring and observability tools (Prometheus, Grafana, Loki), as well as logging and alerting solutions (ELK Stack).
- Very good knowledge of at least one programming or scripting language (Go, Python, Bash) for automation and monitoring tasks.
- Deep knowledge of Linux MD RAID (mdadm, sedadm) and LVM.
- Expertise in Linux performance tuning and network stack debugging (ethtool, perf, tcpdump, ibstat, ibtop).
- Practical experience with S3, Ceph, and software-defined networks.
- Fluent in German and English (at least B2 according to the CEFR standard).
Experience
- Several years of experience as a Site Reliability Engineer or in a related role (Linux System Administrator, Platform Engineer, DevOps/Infrastructure Engineer, Full-Stack Developer).
- Experience with virtualized environments (QEMU/KVM, OpenStack, Proxmox), cloud storage technologies (File, Object, Block), and secure handling of Docker & Kubernetes.
- Experience in code management (merge conflicts, feature branches, merge requests, CI/CD) is advantageous.
- Experience with RDMA, InfiniBand, and RoCE protocols.
- Experience with established software development practices (code reviews, build processes, packaging, testing).
This text has been machine translated. Show original
What we offer
- At the end of the application process, candidates must undergo a security check.
- Hybrid working model.
- Shift model working hours.
- At some locations, a subsidized canteen and various free beverages.
- Modern office spaces with excellent transport links.
- Various employee discounts for activities and products.
- Employee events such as summer and winter parties, as well as workshops.
- Numerous opportunities for further training and development.
- Various health offerings, such as sports and health courses.
This text has been machine translated. Show original
Benefits
Work-Life-Integration
Health, Fitness & Fun
Food & Drink
Topics that you deal with on the job
Job Locations
This is your employer
United Internet AG
As the leading provider of communication solutions in Germany, we offer our users a safe and reliable way to communicate with our strong brands: 1&1, GMX, WEB.DE, and mail.comโdespite handling 500 million incoming emails every day!
Description
- Company Type
- Established Company
- Working Model
- Full Remote, Hybrid, Onsite
- Industry
- Internet, IT, Telecommunication
Dev Reviews
by devworkplaces.com
Total
(1 Review)3.4
Engineering
3.3Workingconditions
3.8Career Growth
3.2Culture
3.2