In this episode, host Jon Collins speaks with Manuel Pais, the author of 'Team Topologies: Organizing Business and Technology Teams for Fast Flow' about the creation and set up of DevOps teams.
Guest
Manuel Pais is a DevOps and Delivery Coach and Consultant, focused on teams and flow first. He helps organizations adopt test automation and continuous delivery, as well as understand DevOps from both technical and human perspectives.
Manuel has been in the industry since 2000, having worked in Belgium, Portugal, Spain, and the UK. Manuel is the co-author of Team Guide to Software Releasability (2018). Manuel holds a BSc in Computer Science from the Instituto Superior Técnico and an MSc in Software Engineering from the Carnegie Mellon University.
Transcript
Jon Collins: Hello, and welcome, or should I say, bien viendo to this week's edition of Voices in DevOps, where I'm delighted to welcome Manuel Pais, who is Portuguese, hence my throwing in words from Portuguese. That's all I know, I'm afraid, though. I'm going to stop there and hand [this] straight over to Manuel. You're a consultant in DevOps. You're a delivery coach. You've written at least one book and you're writing another. Maybe you can tell us a bit about yourself and what got you here. Well, I know we're going to be talking about the topics of the books, but why those books? I guess a book is just capturing what you felt needed to be said in some ways. Maybe, yeah, just start with you. What brought you here, Manuel?
Manuel Pais: Obrigado, John. Thank you for inviting me. It's just like my second life as a consultant since 2015. Before, I was a developer, tester, release engineer. Since 2015, I've been involved with different clients, helping them adopt DevOps, and we'll probably get to what does that actually mean, and continuous delivery practices as well. Sometimes it can take very different forms. It can be training, workshops, assessments of what teams are doing, and sometimes more strategic advice.
Through that work with all these different clients, me and Matthew Skelton, who co-wrote the book, Team Topologies, which is going to be published in the fall by IT Revolution Press, we've seen how there's one very important aspect that is not always considered when we're talking about DevOps, which is how are teams structured, how they interact, and how that can enable or be an obstacle for actual DevOps implementations and improving how the work gets done, especially in medium to large enterprises. The book in particular, is all about that: how to think about your team organization and what are the responsibilities of the teams and how they interact for a better flow of work and essentially, getting things done more effectively.
First, I'm intrigued, I have to say, to know what happened in 2015 that caused your–is it destiny or just a switch?
It was a switch. I'm also InfoQ lead editor for DevOps since about 2012, 2013. I was already very much interested in the DevOps movement since the first DevOps days conferences and starting to read about what Patrick Dubois was saying about what DevOps is. In 2015, I switched from being an employee to become a consultant around continuous delivery and DevOps. That was very, very exciting. I started working with a small consulting [firm] called Skelton Thatcher in the UK. It was really interesting to see what different clients were doing and obviously, context is super important. It's something that we don't always think about.
DevOps is a lot of things, right? There's not a strict definition. I find [it’s] super important to understand the context of an organization. What are you actually–what are you doing? What kind of products do you have? What kind of services do you provide? How are you organized? What are you trying to achieve? Do you need to deliver faster? Do you need to improve your quality? Is the service poor? Do your customers have a poor experience? How can you improve that? Not a lot of organizations that I see actually think a bit more deeply about this. They just think about DevOps as a goal. DevOps, at most, is a way to reach some goals. When we hear some large organizations saying “we want to be DevOps by 2020” or what have you, what does that actually mean? It's very vague.
In some ways, DevOps is as much a symptom as anything in practice, which is what startups might be doing DevOps without even–why are even talking about this stuff? “How else would you work?” is what I've heard before from more startup-y kind of people.
Yeah, especially when you get to large organizations, it's ironic. On one hand, it's good to have management with this kind of urgency that we need to do [it]–understand that DevOps brings value and they're important practices and this kind of urgency to improve, but then at the same time, they're looking for something too immediate. I've had this question from different potential clients: “In how many months are we going to have DevOps teams or are we going to be DevOps?” That's really, to me, not the right way to approach it. There is some uncertainty the larger the organization and how are we going to get to that better place, which first of all, needs to be tied into some goals and metrics that are more specific than ‘doing DevOps.’ Secondly, the way to get there, it's not straightforward. I can't say, “well, you're going to–by month one, you're going to do this; month two, you're going to do that.” You need to allow teams to also make progress by themselves and find out what works and what doesn't work in their context.
I want to get on to teams. Obviously, you want to get on to teams, because that's your thing these days, if I can put it that way. Equally, I want to also cover the notion of flows, which I know you've written a lot about. Clearly, a lot of these things, they're a meta level. It's a term, but if you build your own universe around the notion of flows, then you're going to get to a certain place versus building your universe around the notion of teams. That will get you to a certain place. What do you mean by flows and how have you been, over the past three years, helping organize this? Start with what do you mean by flows, but then, why did you see flows as an important thing to help people with?
The book I've co-written with Matthew Skelton is all about that, really. The flow of work is–it means that whatever teams are involved in delivering some piece of work essentially are not blocked. They can perform the work as soon as possible, but also with the required levels of quality. What we've seen with different clients is that that can mean different things at different times. Sometimes you need teams to collaborate more closely, because there's uncertainty around technology or the market where you're delivering a new product. You need teams to actually work very close together and find out what they need to do, so they're not locked waiting for another team to do something. Other times, you're more in execution mode where we already have discovered the essential things and now it's just execution, getting things done and delivering.
In the book, what we recommend, what we've found out [is] that there are four, for us, fundamental team topologies, types of teams, and there are three fundamental modes of interaction between those teams. The kind of teams we recommend is start with a stream aligned team. We didn't want to call it a product team, because that seems to limit it to just one product per team. You have to think about what's the right size of software that the team should be responsible for, but you have a team that's aligned with some line of business or a segment of the business. They are working towards the goals of that business line. Ideally, they're autonomous and they have end-to-end ownership. That means they can take some idea and they can discover what needs to be done to deliver that idea to customers and then actually build, test, deploy and monitor the actual software.
The problem with that is that when you think about a technology stack and how many tools and how many frameworks and methodologies you need to know, that becomes very complex for one team, which is usually seven to nine people. We reduce that kind of cognitive load on that stream aligned team. That's where the other topologies come in. We've seen that a very successful pattern across many organizations is to have a platform which provides a set of services or at the very least, a set of documentation that helps the stream aligned teams do their work without having to know all the details about how to monitor, how to deploy, etc. So you will need some kind of platform teams.
You will need also ideally enabling teams. What we call enabling teams are those which are more expertise-focused, so you can have an enabling team around test automation, for example, or user experience, those kind of functions which don't always make sense to have a full-time person in a stream aligned team. You need to have that capability, though, so you can have an enabling team that is essentially orbiting around the stream aligned teams and helping them build those capabilities so that they can do the work without being experts.
A fourth kind of team is what we call complicated subsystem teams, so in very special, very, very rare occasions where you have PhD-level algorithms or components that really need super specialized people, then you might have a team that's dedicated to that, even if that component's not directly aligned to the business segment. That team should also provide–like a service, so other teams that are using that component should see it as a service that they're being given by the team.
Those are the four fundamental topographies: stream aligned, enabling, platform, and complicated subsystem, and then we have three interaction modes between them, which are essentially: 1) collaboration, 2) acts as a service, so we as a team are providing service to other teams which are not customers, and 3) facilitation which is what enabling teams do, facilitate other teams to do their work and to gain capabilities and be able to do all the things [like] testing, monitoring, deployment, etc., without requiring experts in each of those areas.
I'm just drawing a picture in my head. I get the stream team; they're essentially the people trying to build something of value, if you like.
Yeah, product teams, if you like.
Building customer or business value, yes, which links very strongly into another podcast that I just had with Christina Noren of Cloudbees, who's very product-oriented. I'm sure what you're saying would appeal to what she's saying. Then if you need a deep dive team, that kind of ‘domain excellence’ team, as you said, could be algorithmic or whatever. Then you've got the enabling team and the platform team. Could you just remind me of the difference between the two? I was getting a bit lost in platform enablement.
That's strongly related to the way those teams interact with stream aligned or product teams. An enabling team, its major goal is to enable stream aligned teams to do their work, so provide them expertise around different areas that they need help with while platform team essentially–most of the time, it will work by providing a service or set of services that the stream enabling teams can use.
Got you, okay.
This changes over time–yeah, I think just to make that point, I think that's the difficulty that people tend to think of this as a static thing; this team does this and that team does that. The way they interact over time changes, so... imagine we have been using AWS and now we want to adopt Kubernetes. That's a really big technological change, so you shouldn't just say “This product team now is going to use Kubernetes instead of AWS.” You need to see that there's a phase of discovery where if you have that platform team in-house, then that platform team and the product teams need to work together closely for a certain period of time so they understand how are we going to effectively use Kubernetes or whatever new technology so that it makes sense for us. We establish what is a good way to use them; what kind of services do we need from the platform? Then you move on and it becomes more of an execution type of interaction.
Most organizations, I think, have some kind of DevOps activity going on now. [In] most large organizations, there'll be a pocket of DevOps or several. There may be one department that's fully DevOps and another area that isn't or whatever. When you're going into organizations' relatively greenfields, let's say, what's the chicken and what's the egg in this? I mean, do you start by saying first thing we need to do is sort out your team structures or how you manage your teams, or do you start by saying “let's just see what problem we're trying to solve here and let's do DevOps right versus wrong” from a flow perspective? Where does team management really start kicking in?
Yeah, that's a great question. There is a period where, depending at what stage you are in your DevOps journey, let's say the initial period, that you need more basic things in place, so before the team topology spoke, we have an online catalog of DevOps topographies which became quite well-known because it's a simple, very approachable way to think about team structures just to get the first idea of well, we can have a dev team, an ops team collaborating closely, or we can have cross-functional teams that actually have ops expertise inside them, different ways of organizing teams. I think that's a good start just to get an overall idea of what kind of teams do we have now, how are they collaborating right now, and where do we want to go? At the same time, if you're in a large organization and like I said before, there's just this goal to ‘do DevOp’s without a very clear idea of what you actually want to achieve, then I like to start by doing team-focused assessments. Let's focus on a team level. What does this team need to do to improve the way they work and the way they deliver?
You start with one team, then a second team, and a third team, and the reason why this is important for me is because one of the keys for a long-lasting DevOps adoption is ownership. If you just have a top-down approach where teams are being told “you need to do this and that and you need to be DevOps,” they won't feel the ownership of that. You need also the bottom-up ownership where teams understand why they're doing things and they have a say in what works, what doesn't. You start from team perspective what are the practices and where do we want to go. Over time, you can see what you can extract—common things and common ways of doing the work that makes sense for multiple teams. That's I think, the starting point that I recommend in organizations which are, let's say, further behind on the DevOps journey. Then at some point–so we in DevOps, we talk also a lot about comms, cultural automation, measuring and sharing. All those things need to be considered and also the team structure.
For me, I think the missing point is that how important the organization of teams and thinking also about Conway's Law, for example. Those things need some kind of maturity to be able to factor into the team topographies and Conway's Law and things like that. Depending on where you are as an organization, you might need to first focus at the team level. Inside each team, what are the practices? If you're not doing configuration as code, automated deployments, then those are basic pieces you need to have the teams working on.
Then you get to a point where the benefits of DevOps are–if you limit that to just those practices, you're not achieving wider benefits of having faster flow and being able to deliver more quickly and with more quality. That's where the team structures and thinking about how teams interact and explicitly considering responsibilities of teams and communication paths between teams, it's where the value comes in.
Cool. Everyone knows what Conway's Law is, and I did not just google it. (laughter) Maybe for our audience who may be less quick on the Google button than I am, if you could just...
Sure, so Conway's Law, essentially Mel Conway wrote this paper, I think it was in–
1967.
The key idea that was taken from that paper is your software architecture, not what's on the paper, but what actually comes out at the end when you deliver, will reflect the communication structures inside your organization. That very old battle between ‘we wanted this architecture—but in the end was totally different’–one of the main factors is because of Conway's Law. It's because you practice when you're doing the work, then the way that teams are structured and how informal communication pass between them, shapes the final software architecture much more than whatever blueprint you had in the beginning.
If you've got a siloed organization, you're going to have a siloed software architecture.
Exactly.
It's a bit like real program is write FORTRAN in any language, which also comes from back then. What you're saying with that is a very communicative organization is automatically going to fit in with this more collaborative approach. Then everyone else is going to need a bit of work to get up to speed on more collaborative team approaches.
Yeah, and also when you think about Conway's Law, the consequences that if we want a given architecture for systems, we need to consider ‘how is the structure of the teams [going to] to match that architecture because they have this kind of mirroring effect?’ We want to think about how teams interact, which parts of the system each team's responsible for, or which services, if you're thinking about microservices architecture. So in fact, when there's discussion between Monolith and microservices, what are the advantages and disadvantages?
One key idea is to think about ‘okay, what is the right size of software for one team to own?’ Rather than thinking specifically about microservices, think about this team of five to nine people, what size of software can they own effectively where they're able to deliver it, operate it, and provide a good service around it, rather than say okay, this team owns this number of microservices and that team owns that other number. It doesn't matter as much how many microservices. What matters is that the size of the overall software they're responsible for matches their capacity to do a good job of building, operating, monitoring, etc.
That's really interesting. You mentioned seven. I think back in my DSDM, dynamic systems development methodology days, I think teams of seven were seen as that cross-functional team of a good number. Equally, I've heard the number 15 bantered around as far as the small organizations. You never want to be bigger than that.
I think that's coming from Dunbar's numbers, isn't it, where it says—
I'll have to google again. (laughter)
Essentially, the core idea that we've taken also in the book is–what he says is that you can have deep trust relationships with up to five people on a team, so you have a strong bond between everyone. Then you can have a good—how do you put it? I think you can have still strong relationship with up to 15 people, so you're able to know what the other people are doing and understand the dependencies. Beyond 15 people, it just becomes much harder.
Got you.
Yeah, that's also something we mention in the book when you’re thinking about not only the size of teams, but also then how you organize teams into groups or departments, etc. You should consider these numbers because they help guide how many teams should be in a group so that the group is still effective–everyone knows what the other teams are working on, not deeply, but they have an idea of what's going on as a group of teams, rather than just ad hoc way of we need more teams because we have more products or more features, so let's just add teams and add people to teams without considering okay, but how much—how is the relationship between the people inside a team and between different teams? How is that going to work? If I have so many teams around me that I don't even know what they're doing, that's not very effective for the business.
Which brings us to the mythical ‘man months’ and adding people to the team will just make the project later and all that kind of stuff. This stuff goes way back. Interestingly, speaking of way back, I was literally just writing about—probably haven't invented this but for the moment, I coined the term the Goldilocks principle of module size, so it needs to be not too big and not too small but just right? And I think if you're talking about microservices, for example, we may end up with all those kind of platform-y microservices of a certain type, [that] can be managed by a certain team, and there is going to be some tolerances on that team level, so you may need nine people to do it, or you may only need five people to do it.
Ultimately, it's about what you're saying; it is about mapping that need space. For example, you mentioned test automation as an enabling team. Maybe we could take three people right now to just help push that test automation thing through and then that team will dissipate over time. Other places, you may actually have a platform team of 11, which is a bit clunky given what we're trying to do, but ultimately that's just the size of the job. We've got to work with intolerances on this stuff.
Definitely. That's evolution over time. What do we need now? Do we need, like you said, test automation expertise? We need maybe an enabling team that's going to help product teams or stream aligned teams to achieve the right level of test automation and maybe in a year or two years, we don't need this enabling team anymore. That team can actually become part of a product team. We don't see enough thought about this kind of evolution in most organizations. They just have an org chart or structure which is considered and it remains the same for several years until they want to do big bang reorganization where they're saying “oh, no, for DevOps, we're going to have DevOps teams,” for example. Then you're going through this continual cycle of reorganization which is typically quite painful for the people in the teams and you're recreating new teams and losing the effectiveness of the teams you had before. Then in a few years, you're going to do this again because there's a new hype or ‘what have you.’ That's not very effective.
It's more important to think on a regular basis what do we need now and what's coming next; what do we need to adopt: new technology or–are we going into a new market? What do we need to do? When do we need to be more collaborative and discover more and deal with more uncertainty and when do we need to execute? We know what we need to do; we just need the capabilities to have these end-to-end teams deliver and have as fast flow as possible. That's something similar to—that's being phrased in different terms by different people. For example, in the Lean Enterprise book, they talk about Horizon 1, 2, and 3, which essentially maps to this: where you have the horizon where you are executing. It's what's making money for the organization already and you just deliver, keep executing on that and provide a good service. Then you have some second horizon where there's more uncertainty; it's more blurry what exactly you need to do. That's where you probably need more collaboration [among] some teams.
Then you have a third horizon, which is really things that are only going to actually come eventually to the market in some years from now, but you're trying to do an early exploration of work; do we need to go and what kind of larger trends we need to think about.
If we look at the recent past in IT, things like artificial intelligence and the internet of things, which if you're an organization that has not done anything around that, you might want to start thinking ‘Does this make sense for us? Where can we apply it?’ and that kind of horizon where you're just considering ideas so you're not suddenly behind everyone else and you're like ‘oh, we need to do AI or DevOps,’ like we're saying. Then it's more of a reactive thing where you're just following trends rather than actually understanding how that fits with your business and what you provide to your customers.
We're in a really interesting juncture, I think. You can see I'm hedging; I'm stumbling how to say this because I don't want to say we're at a point of a paradigm shift or any of those waffly terms that I've literally heard people say for decades. What's really interesting about now is the fact that over the past three, four, five decades, we've seen a lot of IT best practices. It's fascinating how you reference things from 1967 and so on. Where we're shifting to right now is actually fully embracing IT and terms like ‘transformation’ are being used a lot at the moment. Five years ago, we were talking about ‘consumerization’ as this kind of problem to be solved, like suddenly technology's all around us and we don't like it; whereas now technology is all around us and we've got to accept it. DevOps is a symptom of that just as it's a symptom of other things.
This is a massively long preamble. (I'm going to get to a question in a second). I'm thinking about some of the things that you're saying are essentially applying new principles and old principles to this changing context, and how once you've adopted them, once you have fully embraced technology being all around you, once you have fully embraced lean and agile and so on, it's a very binary thing. There isn't another step to happen. You either are not embracing it; then you're embracing it. There isn't a third step. I wonder how much of this whole conversation is about being on that cusp, being on that shift, and whether or not we'll still be having these same conversations in two years' time or whether the kinds of team structures you're talking about in the book will just become the norm rather than the exception.
The first question is: what do you think? The second question is: where are things going from your perspective, and how do you see the future evolving?
In terms of thinking about these team structures as the norm, I would like that to happen. I'm not sure that's going to be the case.
Yep, everyone read your book, by the way. It's going to be a great book, I can tell...
I think there is more and more awareness that the way to do work effectively and with quality and providing good service has a lot of factors and also, it's not—it's clear that we cannot achieve that with this kind of waterfall model where we expect to plan everything in advance. Now it's clear we need to accept uncertainty and certainly Agile helps with that a lot. The fact that things are not static, so you need to–technology is changing all the time. Start-ups are disrupting existing markets all the time, so uncertainty is key. What I find is that we now accept that more within the delivery and software development teams. Sometimes at the organization level, more strategic level, not always accepted yet that there's a lot of uncertainty and we need to deal with that.
I think the team structures and interactions between teams, thinking about that comes in line with accepting uncertainty and accepting that you need to evolve and what's working now—it's not going to work in two years. You need to keep evolving your organization and be more lean if you like, more agile organization and in terms of-
It does come down to being organizationally agile, which—I'm trying to balance the two things in my head which I don't fully understand. One is I think this model of every single process we have should be unique and different, and there's no such thing as a standardized workflow like we had before anymore. That, to me, is a bit of a problem because we're spending so much time reinventing all the time and that's distracting from the efficiency of actually doing in some ways. At the same time, we've got to have this dynamic organization where every six months, every three months, every moment that it's the right time, these teams aren't working for us and people won't feel offended or ‘oh, my God, here we go again.’ It's just a reorganization.
Here a lot in the NHS in the UK for example where it's just constant states of reorganization but it's never right, whereas what you're talking about is a constant potential for reorganization in order to make it always right. That's a very different perspective.
Yeah, team stability is very important, as well, for people and for the flow of work as well. We're not saying change your teams all the time, but first of all, think about what their responsibilities are. What is their—what should they be working on that makes sense for a team to be able to own effectively end-to-end, and then think about what other teams you need and what kind of collaboration or interaction modes you need with those other teams and evolve that over time. I can have the same team which needs to collaborate for a period of time with the platform team on how do we adopt Cornelius, for example, if we made the case that this is really going to be helpful to adopt this technology. Then after that period of time, we say okay, now we don't need to collaborate as much anymore. We're back to using this as a service because we have clearly defined APIs and usage for this technology, and we understand it within the context of our own organization.
Keeping teams stable is important so that you don't have that constant feeling of always reorganizing, but understanding that the way they work with each other is going to vary over time and should be explicitly addressed. I think that's the key to be able to, like you were saying, manage and address all the constant influx of new technology and new ways of working. We need to adopt on that because otherwise, we'll be behind other organizations who are able to go faster, but at the same time, we need to be explicit about it.
Which is why I guess the notion of team topologies—this is going to sound like a pitch for your book, but it genuinely isn't. The notion of topologies, ie. patterns which we can apply–you're always the platform team. Don't worry about it; don't panic. We're not going to restructure you out of existence in two weeks' time. The platform changes and therefore what happens in the platform also changes, but at least we've got fixed on the models that we're applying and that creates that stability even though we know that the work–no one would deny—as you say, it's an uncertain business environment and an uncertain technology environment. We've got to deal with that whether we like it or not.
Exactly, and I think what you said before, we now have a good set of practices that we know–at the engineering level, continuous integration, continuous delivery, deployment automation, configuration as code, etc. We have a good set of practices. We have huge large ecosystem of tools around them, which can help the teams a lot to be more effective and execute and deliver faster like you were saying. We have that even though in many organizations, the teams don't have those skills yet, so you still need to help them out with some expertise around that. The practices are there. If the team adopters get engineering maturity practices, then they will be able to execute faster. Then you can think about the topologies in order to give you the tools or the capability to deal with uncertainty and changing requirements at the business level and technology level as well.
Thank you so much for this. I'm going to leave you with the last word, if I may. Time has shot passed. Apart from reading your book, which is out in September, so in the interim while people are waiting for that to come out, what would you advise any organization that they're doing DevOps to an extent; they're looking to expand their use of DevOps, but they're finding that it's hitting more and more bottlenecks and it feels like it's becoming harder rather than becoming easier? What would be the first conversation that you would look to have with an organization that's struggling with scaling up their DevOps efforts?
First thing would be to, like we said before, forget about DevOps for a moment. Think about what you actually want to achieve. When you know what you want to achieve, then you can come up with metrics that help you see if you're making progress. I know it's easier said than done, but just start somewhere. Start with some kind of metrics around quality, speed, operability, serviceability that can help at least provide you the visibility of where you are now and how you are progressing over time. Then you need to think long-term. How do I improve the culture? If there is a culture of blame in my organization or people are afraid to try things because they get punished if they make mistakes, then how do I change that long-term?
Also, try things and start–depending on your journey but allow teams to take ownership of their DevOps journey and allow them to try things, see what works, what doesn't, before you jump into a framework or whatever you have that gives you the idea that you will be able to be DevOps in X months or years. It doesn't really work like that. It needs to be much more organic and that you let teams try and have ownership, so bottom-up approach, while at the same time you are looking at what are the things that work and don't. Then expand that to the whole organization.
It's a long process, so don't expect immediate results. Allow things to evolve but measure at the same time and think about how can you improve the culture and sharing of good practices. A lot of times, there's too much focus on the tooling and automation, which is helpful but when you look at the whole stream of work, the bottlenecks are often not there or are in other parts of the organization.
That's amazing. Don't lock things—often we see the solution, got to lock it down, got to put a framework in place. We've got to simplify and structure and so on and so forth. What you're saying is: take responsibility, add visibility, if I can use the word ‘observability.’ Make sure you understand what it is, where you are, and then move from there, but allow people to develop it in such a way that adds the most value rather than trying to tell people what to do.
Exactly.
Well, that's fantastic. Thank you so much, or should I say obrigado, Manuel, and thank you so much for your time. We'll be tweeting your Twitter handle the same time as this, so if anyone's got any questions for myself or for Manuel, mostly for Manuel, then please let us know and speak to you next time. Thank you, Manuel.
Thank you very much.
- Subscribe to Voices in DevOps
- iTunes
- Google Play
- Spotify
- Stitcher
- RSS