Startdatum:
Mai
Enddatum:
Ende 2026 + Option
Beschäftigungsart:
Freiberuflich
Region:
Remote + FFM oder Berlin
Beschreibung:
Für unseren Kunden im Energiesektor suchen wir ab Mai erfahrene Unterstützung als Plattform Operations Lead (m/w/d. Die Tätigkeit erfolgt remote und nach Absprache bis zu 25% vor Ort in Frankfurt oder Berlin.
Project Description
The team is building an internal platform for software product developers to accelerate the development and delivery of software products to tackle the massive challenges facing the energy sector. The Platform is a service oriented, cloud-native platform that is being built to provide application teams with self-service capabilities to develop, run and operate their software products. It provides services for application infrastructure, data, service lifecycle management, application build and delivery as well as services to operate their software products. The Platform is deployed as a hybrid cloud, encompassing both private cloud and selected public clouds.
General Description
Local Operations manages the on-premises production platform, which serves as the primary host for all mission-critical business applications . Local operations are responsible for the following core areas:
• Platform Stability: Ensuring the high availability and performance of the on-premises private cloud environment.
• Application Hosting: Consulting on the seamless operation of Germany-specific productive business applications.
• Incident Management: Resolving technical issues within standard business hours to minimize operational downtime.
• Lifecycle Maintenance: Executing routine updates, patches, and system optimizations within the local infrastructure.
Scope of Work
Objective: Enforce Operational Governance
Tasks:
• Providing definitions, guidance on the implementation, and ensure adherence to operational processes for incident, problem, change, release/deployment, and service requests (ITIL-aligned, DevOps/SRE adapted).
Objective: Prepare and Validate Operational Readiness
Tasks:
• Establishing and verifying readiness criteria for go-lives and releases, including cutover and rollback strategies, runbooks, monitoring and alerting setup, access readiness, and documentation review.
Objective: Manage Coordination and Escalation
Tasks:
• Facilitation of cross-tier coordination between T1, T2, and T4 stakeholders.
• Driving stakeholder alignment and management of escalation and decision-making in high-severity situations.
Objective: Leading Continuous Improvement Initiatives
Tasks:
• Monitoring and improvement of reliability KPIs, address recurring issues, reduce operational debt, and champion automation and toil reduction.
Objective: Lead, Coach, and Develop the T3 Team
• Guide Infra and Kubernetes/Data SMEs, manage onboarding and knowledge transfer, and build team capability.
Must-have experience
• The contractor must be a senior level professional with proven experience in operations management of private cloud solutions with following expectise:
• 10–15+ years in IT operations / service delivery / platform operations with demonstrated leadership in mission-critical environments.
• Proven experience implementing/leading Incident, Problem, Change, Release governance in production.
• ITSM / Collaboration: Jira Service Management (JSM), Jira, Confluence.
• Platform delivery concepts: GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm) to govern deployment/readiness standards.
• Experience defining/owning Operational Readiness / Transition to Operations practices.
Must-have language skills
• Proficiency in both speech and writing in English (at least C1).
• Proficiency in both speech and writing in German(at least C1).
Preferred experience
• Experience operating in regulated / high-availability industries (banking, telco, public sector, healthcare).
• Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management.