Principal Site Reliability Engineer

Principal Site Reliability Engineer

Principal Site Reliability Engineer

Job Overview

Location
London, England
Job Type
Full Time Job
Job ID
30740
Date Posted
1 year ago
Recruiter
John Apl
Job Views
32

Job Description

Intelligent Conversation and Communication Cloud (IC3) powers billions of real-time customer conversations across Microsoft’s first party (Teams, Skype), and second party (Dynamics) solutions. IC3 enables reliable and high-quality audio/video calling, meeting, messaging services that work every time from anywhere seamlessly across all customer touchpoints. IC3 makes conversations on our platforms more intelligent in real-time empowering best-in-class productivity tools for the modern workplace where every call, meeting or chat will make the next one better.  

Responsibilities

Responsibilities 

 

Principal Site Reliability Engineer - Calling/Meeting Services 

Calling/meeting functionality is a critical function that IC3 provides to the Microsoft Teams, Skype, and Skype for Business products. It is also the first consumer of the platform API that we provide to ensure we deliver on the most usable API to the developer community. We are looking for a talented and passionate Site Reliability Engineer to join the team that manages large world-wide infrastructure for Calling and Meetings services. 

 

Key responsibilities  

  • Design, write and deliver software to optimize all aspects of deployments (Resources/Applications) ‘infrastructure-as-code’. 
  • Optimize service release by improving Azure DevOps release pipelines. 
  • Drive services towards reliable/predictable deployments achieving better ‘time-to-deploy’ metrics for Services across Microsoft Teams. 
  • Develop safe rollout plans for a portfolio of services to prevent outages. 
  • Build, run, and improve critical service environments in large scale data centers. 
  • Learn and enhance existing tools, developing new tools to meet new scale and features aimed at reducing manual intervention, enhancing prevention, detection, and mitigation of service impacts. 
  • Manage world-wide capacity for a portfolio of services to meet the usage growth and efficiency requirements. 
  • Coordinate planning and execution with internal engineering teams, business partners and technical leaders across the division. 
  • Influence and Collaborate across orgs to bring best practices, architectures, standards, and methods for large-scale distributed systems. 
  • Analyze data and providing operational insights into service relia

Qualifications

Qualifications 

 

Essential qualifications  

  • BS/BSE in computer science, Management Information Systems or technical disciplines or equivalent education 
  • 10+ years as Site Reliability Engineer/Developer working on large scale/distributed systems. 
  • 5+ years implementing/automating using CICD tools. 
  • Good knowledge of basic networking fundamentals & troubleshooting tools.  
  • Proven experience creating distributed systems tools of moderate to high complexity.  
  • Ability to manage and deliver multiple project phases at the same time. 
  • Strong analytical and problem solving and organizational skills. 
  • Excellent written and oral communication skills. 
  • Ability to deal with the ambiguity associated with working in a fast-paced and changing environment. 
  • Strong Windows OS / Linux troubleshooting experience. 

 

Preferred qualifications  

  • 3+ years of Azure development experience (ARM templates, Azure Monitor, PowerShell, Kubernetes, Docker etc.) 
  • 2+ years automating builds/releases using YAML. 

 

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.  We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Job ID: 30740

Similar Jobs

Cargill

Full Time Job

Principal site reliability engineer Principal site reliability engineer

A Typical Work Day May Include: • Completing preventative, predictive, ...

Full Time Job

Deloitte

Full Time Job

Principal site reliability engineer Principal site reliability engineer

Are you looking to elevate your cyber career? Your technical skills? Your opport...

Full Time Job

Cargill

Full Time Job

Principal site reliability engineer Principal site reliability engineer

Cargill Animal Nutrition is a global business that serves large-scale feed mill ...

Full Time Job

Veolia

Full Time Job

Principal site reliability engineer Principal site reliability engineer

Primary Duties / Responsibilities:● Assist in daily operational troublesho...

Full Time Job

Cookies

This website uses cookies to ensure you get the best experience on our website.

Accept