Incident & Problem Manager

Incident & Problem Manager

Job Overview

Location
London, England
Job Type
Full Time Job
Job ID
7114
Date Posted
9 months ago
Recruiter
Alice Lidze
Job Views
246

Job Description

ASOS Technology is going through an exciting period of transition and major investment. – this includes a number of strategic programmes to deliver the amazing technology and business solutions to support our ambitious global growth plans. At the heart of these plans is the rebuilding of our digital platforms and channels to provide the best shopping experience for our customers. Our plan is designed to enable us to really put our mobile experience first, enable personalisation and support a data driven organisation. We are also making significant investments in all our Buying, Merchandising, Finance and People systems with the latest toolsets and applications to accelerate the next phase of our global growth. We are also improving our ways of working within Technology to enable autonomous platform development and improve our engineering and agile practices.

ASOS is one of the UK’s top fashion and beauty destinations, expanding globally at a rapid pace. Our values are to be authentic, brave and creative, and we live and breathe these in everything we do.

We believe fashion can make you look, feel and be your best and, with technology in our DNA, we deliver the latest trends to our digital-obsessed 20-something market. Our award winning Tech teams sit at the heart of our business. We deliver technical innovation and pioneer incredible solutions, which are crucial to our continued success. We’re extremely ambitious and thrive on the individuality of our amazing employees. Our values encompass everything needed for our tech people to be the thought leaders of tomorrow.

About the role:

As an Incident & Problem Manager you will be responsible for effective incident management on our large-scale, distributed, mainly cloud-based platform that is build and operated by various teams. You will provide 24/7 service and support of our Incident Management processes supporting the Tech estate when managing Major incidents. For that you will take part in on-call rota. You will be responsible for issuing timely and accurate communications in incident situations to stakeholders whilst enabling the relevant engineering teams to maintain full ownership of incident resolution. A key strength therefore is to quickly understand how the applications and team landscape of our platforms is connected from a technical and people perspective, what impact an incident has to our customers and who are the people that need to be involved for quick resolution.

As part of the Incident & Problem Management team, you will collaborate closely to coordinate cross-domain incident response, cover for each other, and define processes and good practices together.

You will work with teams and stakeholder on always raising our incident management maturity, making sure we have efficient processes, a high degree of automation and clear KPIs and metrics to measure how we are doing with operating our services and platforms. You might also get involved in definition and implementation of services and operation models and work alongside our Service Management tooling team to mature the toolsets that support these processes.

You love to learn and acquire new skills and you enjoy teaching others. You are not afraid to get stuck in and work directly with teams.

The role calls for a multi-faceted approach in not only effectively addressing incidents as necessary but also providing value through applying the requisite Problem Management actions in both a proactive manner as well as at the conclusion of Major Incidents; conducting Major Incident Reviews, initiating the appropriate Problem Management investigations to identify and address Root Cause and to reduce reoccurring incidents as well as accurately and effectively reporting on those endeavours.

What you’ll be doing:

Incident and Problem Management:

  • Assist to develop and maintain the incident management process as necessary to meet business needs. 

  • Develop, co-ordinate and promote incident management activities across the whole of ASOS and take responsibility for the effective functioning of the Incident Management processes across all support areas  

  • Responsibility for all aspects around the management, communication, and resolution of Major (P1 or P2) incidents. 

  • Provide any required SME advice to all support staff in the management and resolution of Incidents. 

  • Review and recommend, as appropriate, changes to support processes to ensure continuous improvement of the Incident Management process 

  • Effectively prioritise Problem Management activities to ensure that the key business drivers are addressed as a primary objective. 

  • Ensure all necessary reporting and metrics are completed for Problem as well as generating problem summaries for P1 issues. 

  • Investigate the underlying Root Causes of Major and persistent / recurring business critical incidents and managing them with support teams through to resolution. 

  • Contribute to the development of the existing Problem Management process as part of Continuous Service Improvement initiatives. 

Team:

  • On Call escalation point for out of hours support rota.

  • On site out of hours’ support required when requested and as part of a rota for defined sales periods.

  • Co-ordinate out of hours business sale events on a rotational basis

Reporting:

  • Produce metrics for both service and performance against the Incident and Problem processes on a weekly and monthly basis.

  • Provide consistent, regular and effective communications around the progress and advancement of high priority (P1 and P2) Problems against a defined schedule.

  • Ensure regular reporting on determined KPI measures and the delivery of quality metrics (in relation to Incident and Problem management).

  • To regularly review performance in response to Incidents and Problems of all types and contribute to Continuous Service Improvement activities.

  • To initiate and utilise trending data to identify, and subsequently address, any process failings or issues within the support environment.

Who we’d like to meet:

  • Strong experience in problem, change and incident management in an agile context

  • Strong experience in managing major incidents.

  • Strong communication skills with a proven track record of engaging senior technology and business stakeholders

  • Having a good understanding of / experience with DevSecOps ways of working and SRE practices

  • Hands on experience delivering/supporting enterprise-scale services

  • Analytical skills to quantifying and analyse performance of operational and incident management processes and data

  • Influential skills to drive improvements on existing Incident & Problem Management processes and procedures

  • Passionate about continuous improvement

  • Preferably ITILv4 or equivalent service management experience at scale and familiarity

  • Experience in leading an Incident Response team using relevant ITSM tooling

  • Experience in using automated incident management communication tooling such as xMatters desirable

Similar Jobs

Google careers

Full Time Job

Incident & problem manager Incident & problem manager

Minimum qualifications:Bachelor’s degree or equivalent practical experience.5 ...

Full Time Job

Google careers

Full Time Job

Incident & problem manager Incident & problem manager

Minimum qualifications:Bachelor’s degree or equivalent practical experience.2 ...

Full Time Job

Northrop Grumman Corporation

Full Time Job

Incident & problem manager Incident & problem manager

At Northrop Grumman, our employees have incredible opportunities to work on revo...

Full Time Job

Google careers

Full Time Job

Incident & problem manager Incident & problem manager

Minimum qualifications:Bachelor’s degree or equivalent practical experience.2 ...

Full Time Job

Cookies

This website uses cookies to ensure you get the best experience on our website.

Accept