Interview with Opening Keynote Speaker Michael Wildpaner

At Swiss Testing Day and DevOps Fusion 2022, Michael, Senior Engineering Director at Google, will talk about how Site Reliability Engineering is the key behind DevOps. In this interview, he also shares personal insights on what brought him into DevOps, on how he thinks the field will evolve and why he is passionate about it.

What`s your journey into DevOps?

Before Google, I’ve built and run distributed systems in various roles, but at that time I didn’t know what I was doing was to be called “DevOps” at some point. Carrying a pager was natural to me, and while I knew that some organizations had dedicated Ops teams, I did not see how a software engineer could be successful in an operations role.

Google hired me due to the trials and tribulations I had to go through to run a HPC cluster for use in Bioinformatics in the 2000-2005 timeframe, “uphill in both directions”. At least I learned that compute jobs should not be run on specific machines, and that Lord of the Rings is a great fantasy novel but not a viable source of host names.

At Google, I went from helping to launch the directions feature on Google Maps, improving Gmail’s message delivery pipeline, co-inventing Google’s dominant rollout platform (Annealing) to building Google’s Security-SRE team and pushing for heavy investment in production security efforts driven by SRE.

Where do you see the field going in 12 months from now?

The two major risks I see today are „reliability fatigue“ as some organizations have invested so heavily in reliability efforts that they think they’re done, as well as loss of basic production skills with the advent and common usage of higher-level production platforms.

What are your thoughts and pitfalls about scaling up DevOps in a growing company?

Being able to systematically teach new team members about reliability engineering is key to scaling up DevOps. In Google’s original SRE model, there was no formal training, and skills were transmitted by shadowing more experienced engineers, i.e. “osmosis”. This made training the next generation of reliability engineers a heroic effort of the few and allowed Google’s early SRE team to break up into tribes (Search, Traffic, Ads, Gmail) that all had different tech stacks and team cultures.

While forcing a whole company on a single platform can be an unwise choice, watch out for unnecessary bifurcation in tools, processes, and culture.

Who or what inspires your passion for DevOps?

Sounds cheesy, but the production engineers (DevOps or SRE, doesn’t matter) I’ve had a chance to work with. I’ve not seen a group of engineers that at the same time is as critical of anything written or said, as receptive to the feedback of others and as willing to step up, take responsibility and just fix the fine thing when something hits the proverbial fan.

«Can you highlight one message as a sneak peek of the keynote «Bridging DevOps and SRE»:

DevOps and SRE are different points on a single continuum, there’s no “one size fits all”. The most important point is to establish an engineering culture that recognizes availability (and security) as basic qualities of any product that can’t be tested into the product later, but has to be designed in.

Last, but not least: How many engineers does it take to change a light bulb?

O(n^2) with the number of nines (of reliability) you need for that operation. Or with the number of colors of your bike shed. It’s definitely turtles all the way down.