What Is a Site Reliability Engineer and How Do I Become One?

FT Contributor  | 

Information technology is one of the most lucrative fields, with site reliability engineers among the top 10 highest-paying information technology jobs in the U.S.

A site reliability engineer, or SRE, is the brainchild of popular employer Google, established in 2003. The position was created to place a greater focus on Google’s accessibility and reliability. They penned what many consider the official manual for all SREs with the aptly named Site Reliability Engineering, available for free online.  

SRE Overview

Using automation, SREs teach websites to maintain themselves when it comes to the operational functions that keep a site running. Instead of utilizing a traditional operations team, site reliability engineers are dedicated professionals who can use their software and IT prowess to reprogram the way websites work.  

Explains Ben Treynor Sloss, Google’s VP of Engineering, “Fundamentally, it’s what happens when you ask a software engineer to design an operations function.”

This new department of dedicated engineers proved so successful that it is now a regular feature at many other companies, too. With companies like Amazon and Netflix also hiring for these positions, there is a much more diverse job market available with far greater earning potential.

An SRE position might pay big money, but job satisfaction is important, too. It is paramount that you determine whether this career is the right fit for you before you move forward, so you know whether it is worth getting your master’s degree in an applicable focus.

What Does Site Reliability Engineer Do?

A site reliability engineer is a driving force behind the internet’s speed, efficiency, and reliability. They serve as the link between the developmental and operational teams by using special coding to automate the many processes relating to the infrastructure and operations of a specific software.

A site reliability engineer is like today’s modernized DevOps role, having leveraged evolution to ensure the constant and reliable automation of a site. Whereas before things were primarily done by hand, the introduction of SREs into the digital realm has paved the way for automation to transform the way the internet works.

As the digital realm continues to evolve each day, so does the role of an SRE. Therefore, it’s a position that demands flexibility. It is also needs-based, with the day’s responsibilities centered around both emergency and maintenance issues, like attending to outages and also compiling reports of historical data.

These are some of the daily responsibilities of a site reliability engineer:

  • Automation programming;
  • Create scalable systems;
  • Site monitoring and maintenance;
  • Risk management;
  • Troubleshooting;
  • Resolve emergencies.

“Our rule of thumb,” says Sloss, “is that an SRE team must spend at least 50% of its time doing development.”

What Is a Site Reliability Engineer Salary?

Based on user data, LinkedIn reports the average base salary for a site reliability engineer is $117,000 annually, with total compensation of $128,000 per year. Total compensation can range from $78,000 to $223,000, depending on your employer, title, and location. The demands of the position are taken into account for an SRE’s salary.

Nearly half of all respondents reported receiving a bonus, with a median amount of over $10,500 each year. Some SREs report other perks, like a sign-on bonus and stock options.

Indeed reports slightly higher wages, with an average salary of $137,570 per year. Reported salaries range from $59,000 to $247,000 with an additional $10,000 cash bonus. Wages also vary depending on where you live.

Indeed shows the following average annual salaries for site reliability engineers based on location:

  • Boston, MA — $142,458;
  • New York, NY — $156,971;
  • San Francisco, CA — $163,479.

Demands of the Job

A crucial part of the position involves working with developmental teams for consultations and troubleshooting. An SRE is a core position, so there will be times when you will be on-call to provide emergency support.

Most SREs work on a rotating schedule for on-call support, so not all of your time is dedicated to emergency resolution. These shifts can last anywhere from a few days to more than a week. While some issues can wait until morning, others may require immediate attention. As an SRE, you must be prepared to respond if there is an issue while you are on-call.

What Are Site Reliability Engineer Qualifications?

An SRE is a highly specialized, technical position that requires a lot of training and education. SREs are expected to combine engineering with math and science to create computer-based communication. It’s like working with one giant IT puzzle composed of programming language, algorithms, and software performance.

Engineers need to be able to see the bigger picture and ensure that the program is able to adapt even as the program expands and grows. It’s not just about writing code; engineers need to be adept and able to adjust to a continuously evolving environment.

Google also looks for IT pros who blend systems engineering with a software background; coveted skills include network engineering and Unix system administration with expertise in infrastructure automation. Sloss specifically notes, “We hire engineers with software development ability and proclivity.”  

There are certain education standards, too. Most SREs possess a bachelor’s degree from an accredited four-year university. As the job market becomes increasingly competitive, however, some employers are looking for candidates with graduate degrees and at least two to five years of experience.

Site Reliability Engineer Career Outlook

Information technology is one of the most reliable and versatile careers that a person can choose. It is also a career field with high demand that only grows more with each passing day.

SREs, in particular, enjoy a ton of mobility within their teams, regularly working with different departments to achieve a common goal. That presents enormous job security because an SRE’s skills can transfer over to many other IT positions and departments.  

It is always a question of whether college is worth the cost that higher education demands, but many job sites show steady demand for site reliability engineers.

Cities like New York and San Francisco are heavily hiring, and international cities like London, France, and Toronto are all home to top employers, as well.

Should I Become a Site Reliability Engineer?

A site reliability engineer is a position that demands great patience. You need to be a great communicator. After all, you are not only communicating using a new vocabulary of IT language, but you also need to work with many other different kinds of teams to collectively accomplish your goal.

SREs are expected to utilize a combination of both hard skills and soft skills in the workplace, including problem-solving skills, excellent written and verbal communication, and the ability to remain calm in stressful, high-pressure situations.

A career as a site reliability engineer is a great stepping stone for developers, systems engineers, and software engineers looking for their next opportunity in an industry that is in high demand.

Working as an SRE also guarantees that no matter where you go or what happens in life, you have a versatile skill set that will serve you well across a number of different positions found all over the world.


Image Source: https://depositphotos.com/

This post was updated February 24, 2020. It was originally published February 24, 2020.