Job
- Level
- Lead
- Job Field
- IT, System
- Employment Type
- Full Time
- Contract Type
- Permanent employment
- Location
- Heilbronn
- Working Model
- Onsite
Job Summary
In this role, you will optimize the stability and availability of our storage infrastructure through monitoring, incident management, and automation using modern technologies while working on a robust storage architecture.
Job Technologies
Your role in the team
- Stability & Reliability: You are responsible for maintaining and optimizing the stability and availability of our highly available, resilient storage infrastructure (block, object, backup, and file storage). You ensure this through proactive monitoring, independently resolving emerging issues, and preventing their future occurrence.
- Automation: You automate the deployment and operational processes in the storage environment with the aim of improving a little every day and continuously optimizing our products.
- Architecture: You and your team are responsible for a robust and efficient storage architecture — because it is important to you to build a long-term stable and reliable solution that our customers enjoy using.
- End-to-End Responsibility: Identifying with the products we provide to our customers is very important to us. Therefore, we actively practice end-to-end responsibility and receive support from many internal STACKIT Service Teams to refine our services.
- Performance and Capacity Planning: You analyze and optimize the performance of our legacy systems with regard to future landscape scaling. This also includes proactive capacity planning.
- Incident and Postmortem Analysis: You are responsible for investigating (Major) Incidents involving storage as part of the Incident & Problem Management process at STACKIT, with the goal of deriving mitigating measures for the future and subsequently implementing them successfully.
This text has been machine translated. Show original
Our expectations of you
Qualifications
- You're eager to make a significant impact and play a key role in shaping the solution with cutting-edge cloud technologies.
- You are an expert in operating storage infrastructure (e.g., solution scenarios, provisioning, scaling, migration, incident response) and automating these processes (e.g., using Golang/Python, Bash, Ansible).
- You are well-versed in containerized system landscapes within storage environments (e.g., k8s).
- You are already working with APIs and further developing them (e.g., REST API with Golang and Python).
- You enjoy the challenges of operating storage systems (e.g., protocols, troubleshooting, performance analysis, high availability, lifecycle).
- You bring passion and enthusiasm for new technologies and topics related to various storage systems.
- You enjoy being part of a motivated team that is always striving for improvements and continuously develops itself (and the products).
- Your excellent communication skills in German and English form the foundation for successful collaboration in international, agile teams.
Experience
- You have extensive experience in the market environment with various storage products (e.g., NetApp, Cohesity, Pure, Ceph) in the areas of block, object, backup, or file storage, and possess good knowledge of cloud environments and their architectures.
- You have experience in monitoring, alerting, and logging to ensure comprehensive system oversight (e.g., Prometheus, Grafana, Elasticsearch).
This text has been machine translated. Show original
Benefits
Work-Life-Integration
Topics that you deal with on the job
Job Locations
This is your employer
Schwarz Unternehmenskommunikation GmbH & Co. KG
The Schwarz Group, based in Neckarsulm, is a significant German conglomerate and one of the largest retail groups in Europe. It operates over 13,900 stores under the brands Lidl and Kaufland and employs around 575,000 people.
Description
- Company Type
- Established Company
- Working Model
- Full Remote, Hybrid, Onsite
- Industry
- Trade