• Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Alex Aitken

Technical and Engineering Leadership, Coaching, and Mentorship

SRE role in team

July 23, 2018 By Alex 1 Comment

I don’t know about you, but lately, I’ve been hearing quite a lot about SREs (or non-acronymized Site Reliability Engineers). Now, there are probably a dozen different meanings for this role and it varies from company to company. I’m going to talk about what we had in the Agoda Homes team and the impact on morale and the impact on the actual reliability of our platform. Basically, for my definition, an SRE is an engineer within the team task with monitoring the reliability of the product and investigating the cause and determining the priority of bugs.

The Job No Engineer Wanted

Initially, we created the role within our product because we were almost at 100% features and we had traffic. So, we needed someone (or a team) to monitor how our production environment was performing and determine which bugs are critical to the success of the project and what the actual impact of the bugs are. I can tell you now, that if you create this role out of thin air – your engineers will probably hate you. I’m being dramatic (of course), but in the end, no engineer wanted to take on the role. It was rotated every sprint (we figured a week was too short and a month was probably too long).

First a Team

As I kind-of alluded to above, we first started out assigning this SRE role to a team. We’d reduce the number of stories the team would need to produce and let them have free reign on what bugs to tackle/determine impact. Now, as I said – the point of the SRE is not to solve the bugs – but investigate and determine priority. Can you already guess where I’m heading? Rather than investigating and determining priority – the team would usually investigate and solve. That sounds nice – until the team is spending a significant amount of time on bugs that probably aren’t a high priority when we have features that need to be completed.

In the end, though, the SRE role assigned to a team led to decreased morale (within the team chasing bugs), and very high unproductivity. We didn’t really change the reliability of our product and we ended up affecting our velocity. With bugs being reported all the time, the team were constantly dropping product work and context switching within a sprint. The cost of this constant ramp-up (think – where did I get with the story) was too great.

Then – a Single Engineer

Right, so the team as an SRE role didn’t work. We also tried having a single engineer from the product every sprint as SRE. This was better but still not good. Basically, the one poor software engineer ended up being named the bug buster. Or bug boy. Or any play on the word bug you could imagine. Now, what happened is that this single engineer would need one to three days handover from the previous bug boy. That’s a lot of time spent just getting to know what the bugs are in the system. Remember, this software engineer was not meant to solve the bugs, but to figure out where they were happening and how big of a priority it should be. That’s hard.

We had a rotating roster. We didn’t ask for volunteers, it was mandatory. Also not great for culture. But it worked. People got on with their jobs. But the bug boy was left isolated and alone. They were no longer part of the team (even though they came to stand-ups and meetings). They had different priorities from the rest of the engineers. What we found was that this role became very inefficient. There was so much time spent ramping up each sprint and knowledge transfer – that bugs were left on our radar for weeks at a time because they were not reproducible (which should mean low priority, right?).

We also found that engineers who were the SRE didn’t necessarily come back with knowledge of the different parts of the system (as you might expect). What ended up happening is that a high priority bug would come through from the PO (Product Owner) and the QAs (Quality Assurance/Testers) and from customer feedback; the SRE would have to drop the current bug she/he is working on and figure out the new bug. So – their knowledge was reduced to the high profile bug.

For the rest of the engineers, there were no more distractions. This was what we wanted, right? No POs nagging us and product work pushing ahead full steam. But having a member away from your sprint meant that the teams became disconnected. Knowledge of bugs was passed from SRE to SRE rather than shared among the team. It was like a “right of passage” to be an SRE. No one looked forward to the role.

What We Do Now

We no longer have SREs within the Agoda Homes team. The toll the role took on the people and the effectiveness of the teams was too great. We still get high priority bugs. We still investigate bugs. But it’s more like a Product task now. The PO chats with the QAs. QAs help determines how much of an impact the bug has on the product. The PO weighs up product and bug work and determines what will bring the most business value. It’s not perfect, but as engineers, we work together as a team again.

Reposted on Medium.

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook

Like this:

Like Loading...

Related

Filed Under: Leadership

Alex

Reader Interactions

Trackbacks

  1. SRE role in team | | IotaHosting.Org says:
    August 7, 2018 at 5:47 pm

    […] published at http://www.alexaitken.nz on July […]

    Loading...
    Reply

Leave a ReplyCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Primary Sidebar

About the author

Alex is an AVP of Engineering currently working at Bukalapak. He is a leader in full-stack technologies. Read More…

Pages

  • Speaking Experience
  • About

Social Profiles

  • LinkedIn
  • Medium
  • ADPList

Recent Posts

  • Interviewing as an Engineering Leader
  • Managing Low Performers
  • Getting Docker, React, .NET Core, Postgres, and Nginx Playing Nice
  • What Makes a Good Software Engineering Manager?
  • “Am I There Yet?” Said an Engineer

Archives

  • January 2025
  • August 2024
  • July 2024
  • October 2023
  • August 2023
  • October 2020
  • May 2020
  • February 2020
  • June 2019
  • March 2019
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018

Categories

  • Coding
  • Essay
  • Leadership
  • Management
  • Roundtable
  • Strategy

Footer

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Copyright © 2025

%d