SREs are a rare bunch in the software community. But there’s little denying that the approach of Site Reliability Engineering is the future of software operations.
Here are some things that make SREs a unique breed in software work:
SREs look at the broader picture
Ask any developer what they’re working on and you’ll see a tiny sliver of the whole codebase. That makes sense for the kind of work that is coding.
Systems, on the other hand, need a holistic view in order to make sure the whole unit works harmoniously.
SREs thrive in ambiguity
Because they have a scope spanning the entirety of a software system, SREs can end up working on various types of problems. Some problems may be well-defined like spooling up infrastructure based on known demand.
Other problems may be more abstract like working out how to cost-effectively autoscale a service that has inconsistent usage patterns and needs high performance.
SREs work beyond constraints like Scrum
Most developers work within some kind of agile framework like Scrum or XP. Some SREs also do that for planned software build work. That essentially timeboxes their efforts. That might work for estimable problems but does not always work for production-level work.
Can an SRE stop working on a problem because it does not fit into the mould of a sprint? That could spell disaster for production software. Daniel Wilhite answers the question of “Can scrum be used effectively by SRE teams?” very well.
SREs don’t stay in their lane
You’d expect SREs to get used to developers throwing the code over the wall, but no. Many are ex-developers, so they will spend a large part of their time coding up solutions for infrastructure and software performance.
Sometimes, they may participate in feature teams as a means of job rotation. This helps them get a better understanding of their developer counterparts’ priorities. Overall, they should spend less than 50% of their time on feature work.
SREs don’t have a monolith job description
SREs come in many shapes and sizes. In smaller companies, a single SRE may be the one-stop shop for all site reliability matters. As a company gets larger, SRE roles may get divided into specialised work.
For example, one SRE may focus on platforms like Kubernetes. Another SRE may spend their time supporting developers in taking up DevSecOps. Yet another may have general SRE responsibilities with the addition of Chaos Engineering.
Comparison with software developers
Both roles are chalk and cheese, so it’s worth considering key differences in how SREs work compared to software developers. Chances are they will need to collaborate closely to make sure software works well in production.
I took inspiration from a Google recruiter’s interview of an SRE, Ciara Kamahele (link here). The key differences I uncovered are in table form below:
There you have it. SREs may be popularly known as software developers who happen to run systems, but they are quite unique in how they work.