Reliability in a world of endless surprise

Interview with Aaron Blohowiak

Who should come hear your talk?

Anyone who is interested in learning about how we pursue reliability at scale: developers, operators and managers. I think there is a mentality that if we try hard enough, we can do things without error. While I always strive to be better, I think a „perfectionist“ mentality leads us down a path that leads to more pain overall than accepting the inevitability of errors and then focusing on driving down the impact of errors when they do occur. I will try to build the case for this belief by following the evolution of notions about causality through history. If you already believe that recovery is greater than prevention, then my talk will help you convince your peers. If you disagree with this strongly, then please come to hear me out and then debate me afterwards — there are few pleasures in life greater than when two people with different views come to a better understanding of each other.

What are some of the DevOps hurdles that Netflix faces?

We have over a thousand software engineers, a microservice architecture composed of hundreds of different services, over 139M customers, and operate 24/7/365. We do not have a centralized architecture committee, nor do we have a standardized change-control mechanism and each team of developers has full responsibility for the reliability of their services. Maintaining such scale and such independence while building a highly available system requires new ways of thinking.

What do you like most about your job?

My coworkers are brilliant, kind and challenging — they inspire me every day. Our culture is one of freedom and frank discussion, we trust people to make good decisions and we tell each other when something could be better.

What is the foremost KPI in reliability at Netflix?

Stream Starts Per Second (SPS.) This is how many people are successfully initiating playback of a movie or show. We are lucky in that we have quite an organization all working toward the single goal of having people watch things they love.

Can you comment on the technical challenges of introducing Black Mirror: Bandersnatch?

For those who don’t know it, Bandersnatch is an interactive film with multiple possible endings, where viewers make decisions for the main character. It was a large effort across many parts of the organization that influenced many different systems. While we were only peripherally engaged in the effort, I know enough to say that I am impressed at how quickly and how well the project came together. I am very excited to see the evolution of storytelling.

And finally….what is your favourite Netflix series?

So many! Hard to choose. In no particular order: The Rain, Kimmy Schmidt, BoJack, Altered Carbon, Maniac, Sex Education, Ozark, Disenchantment, The Last O.G, Stranger Things. I also like some of our original movies like Bird Box, IO, Bright, and Ballad of Buster Scruggs.