BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Use Engineering Strategy to Reduce Friction and Improve Developer Experience

Use Engineering Strategy to Reduce Friction and Improve Developer Experience

Bookmarks
49:21

Summary

Will Larson discusses what problems engineering strategy solves, examples of real engineering strategies, how to rollout engineering strategy, troubleshooting why your strategy rollout isn’t working.

Bio

Will Larson is CTO at Carta, and has been a software engineering leader at Calm, Stripe, and Uber. He is the author of An Elegant Puzzle and Staff Engineer.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Larson: What I really want to talk about is this idea of engineering strategy, and what I would term almost like a crisis of engineering strategy. I say it's a crisis because everywhere you go, you get internal feedback from a Culture Amp survey or something like that, and people just lay in to you. They're like, we don't have an engineering strategy, we have these idiots that are supposed to be leading the org, but they don't have a single line of strategy written down anywhere. People are pissed. A lot of times they've been pissed at me. That's not a whole lot of fun. It's not just companies I work at, or normally it's my fault. I also get people writing in emails all the time, like, we don't have a single line of engineering strategy written. It's like, we can't make decisions about picking our database. We can't make decisions about cloud migrations, monoliths, microservices. What is this? Where is the strategy in the entire industry? That's a pretty interesting topic. I want to basically do five things. I want to try to convince you that engineering strategy is pretty simple. Give you a definition for it. I want to actually make the challenging to defend claim that it might even be useful, even if you don't see a lot of it. I want to convince you that actually strategy is everywhere. Even if you're writing mean feedback to your staff engineer group or your architecture committee at your company or something, I'm going to make the argument that I think you do have a strategy, even if you don't see it. A written strategy is a lot more useful than unwritten strategy, and maybe we can do something better. Then I want to end with the idea of how can each of you not as like "executives," or "important people," but how can you as a practitioner, a staff engineer, a senior engineer, an engineering manager, how can you actually advance and push forward engineering strategy at your company? I have a couple ideas for that.

Engineering Strategy is Honest Diagnosis + Practical Approach

First, this is classic book. This is Richard Rumelt, "Good Strategy, Bad Strategy," and a foundational text on what strategy is. He basically argues, strategy is just three things. There's a diagnosis, like what is actually going on right now? A diagnosis might be like, our revenues are going down, engineering costs are going up, and that's really bad. That's not good. That's just a factual description of reality. There's no inherently good or badness. It's just a description of reality. Then a guiding policy, ok, so you're starting to run out of money, your costs are going up, your revenue is going down, what do you want to do about it? Maybe you're going to turn off a lot of your servers. That's pretty simple. A guiding policy is generally, what do you want to do? Finally, Rumelt has this idea of coherent actions. Rumelt's really worried about this idea of, how do you make sure strategies aren't just words? How do you make sure it's not just some executives putting some words together? You go talk to the board of directors and you talk about how smart you are, and you go home and have a coffee or something, and just hope no one notices you haven't done anything. That's the biggest concern for Richard Rumelt is just like, how do you make sure strategy is real and not just words on a paper that someone might write?

I want to argue, engineering strategy is a little bit different in the definition I'm going to offer. It's pretty similar to the first two that Rumelt offers, but it's a little bit different. The reason it's different is that, fundamentally, a lot of engineering strategy is about, how do we shape future decisions? It's not necessarily just about, how do we make sure we don't write bogus stuff? It's more about like, how do we shape the response to incoming requirements from the product teams we work with, the other engineering teams we work with, the business, the broader macroeconomic environment around us? To me, when I think about engineering strategy, the most interesting question is, how do we shape the future decisions ahead of us? The next time someone comes and wants to set up a new service, how do we react to that? When someone says they want to introduce a new programming language into our stack, how do we make that decision? That's why I think engineering strategy is really interesting. Let's get into a little bit more of this.

To use someone's view in an example scenario, it's a pretty contrived scenario. I'm going to argue that probably at least a third of you, maybe half of you have worked in literally the contrived scenario I'm about to lay out here. Fake company, Widget & Hammer company. You join it. It's fantastic. You find the heart emoji, you throw it on there, feels good. You work on a team, you're working in a Python monolith, and you're building this widget product. It's going pretty well. You feel pretty good. You hire a new CTO, and the CTO is really opinionated and hates monoliths for some reason. The CTO is like, biggest thing when I come in is like, I'm going to prove my value at this company by getting rid of the monolith. Fantastic. I literally bet at least half of you have gone through this exact sequence of events. You switch to a new team. You make a new product, the Hammer product, and it's great. It's a new product. There was no code in the monolith. You're really happy. You're just typing away, like people on the monolith are maybe unhappy, but you don't know. You're just building a new product, it's going great. Two years later, you talk to your old team, and it turns out, that they're still in the monolith, they haven't made any progress getting out. You had a great time in this decoupled service, but the people who have been in the monolith are still there. Two years later, also, like they wrote some code that's pretty useful the last year and you have literally no idea how to share code between the services that spun out since, and the original monolith where the majority of your product goes. How can engineering strategy help us make decisions like these? Let's talk through.

Honest diagnosis is the first piece. I want to start with the opposite of that, what is not honest diagnosis? What is dishonest diagnosis? I think, again, these will be pretty familiar concepts you've probably all heard. We can migrate the services in three months is something that I hear a lot and is never true. We talk about examples like two years in, you still haven't moved the original product over out of the monolith. That I've heard a lot. That's true. Moving to services in three months is almost certainly not true. We've actually moved complex components out of the original monolith and proven it running as an independent service. In this case, you build a new service outside of it, but you didn't actually move anything out. Obviously, a lot easier to build something net-new than to actually migrate something out. You haven't proven anything at this case, so you can make this argument but it would be dishonest to make this argument. Sometimes it's true, but often engineering will make a decision that product and the business is ok making an investment in technical debt or whatnot, that will slow down product execution. In 2023, and almost always any other time, but especially in 2023, when the macroeconomic climate is so challenging around us, almost no company is actually willing to make this compromise right now to meaningfully slow down to support a migration. This could be true, but often it's not. If you say you're going to staff up a team that's going to do all this work and support both the existing monolith and the new service migration, could be true. Usually, when I see these migrations, everyone moves to support the new service architecture, every product engineer is working in the monolith, and everything grinds to a halt. People are like, I don't know why this is happening. It's because people have failed to acknowledge the reality happening here.

Let's think about an honest diagnosis. We're going to pick up a trend really quickly. It could be true that you can migrate in three months. If you're a small company, maybe you don't have many customers, maybe you don't have much data in the production databases, it could be true, you can migrate in three months. It also could be true that you've derisked moving an important component out of the monolith. It could be true that your company is willing to slow down. It could be true that you're willing to actually fund both. The really key idea here is that honest diagnosis is not about universal truths. This is where you see people come from a previous company and just get into so much trouble. Because what was true at a previous company, maybe your previous company was willing to slow down product velocity to make major technical investments. Maybe your previous company was willing to actually take something out of the monolith and verify it works. A lot of companies aren't. This is the biggest thing when you think about honest diagnosis. It's not there's like a preordained list of honest things out there that describe how companies work. It's that you have to actually understand the reality of the company you work in, and your ability to influence that company to write a good diagnosis. If you can't do that, every other piece of strategy work you do is wrong because it's built on a foundation that's simply untrue. If you aren't willing to honestly describe how things work, you cannot do strategy, point blank. Of course, describing how things work in reality is not sufficient to solve the problem, just gives you a foundation to build on.

Practical approaches acknowledge tradeoffs. Here's a really disappointing, but very good approach. You're like, we want to move to services but we don't have enough staffing, and so we're going to delay for 12 months. You don't come to a conference and do a big conference talk about how you delayed the services migration for 12 months building tooling. That's not the super exciting, energizing thing that you get to brag about. This is like, this is going to work. This is like a real thing you could do, that would work. I think this is one of the awkward things is like, if there's no tradeoff in how you're approaching a problem, I don't think you've said anything. If you have a strategy that has like, you're going to go build this thing really quickly: no tradeoff, no strategy, it's not real. Similarly, here's another strategy you can't go brag about, maybe in 2023 you can brag about this in a tech talk. Generally speaking, if we can't support new programming languages, that doesn't sound super exciting. That's not the exciting thing to brag about, but this works. This means you get to concentrate your investment. Again, I really push this idea of like, when you think about guiding policies, when you think about how you want to navigate a problem, if there's no tradeoff, there is no strategy.

Let's think about how an approach might work for the Widget & Hammer company. First, let's say they did get buy-in for adding two engineers. It's not a ton of engineers, but can probably get buy-in for two. Those engineers are only going to focus on supporting the new service migration. Every other engineer is going to focus on supporting the monolith where all the production, like all the product code for the production services that are driving revenue, that are actually doing something useful for the broader business, live. These two get to focus on smoothing the path to the new future you're trying to build. Maybe you're going to actually ensure that you migrate an existing product out from the monolith to the services, and you're going to only do that one piece, that's the only thing you're going to move out, and you're going to operate it as a service and get some feedback. If it's not actually better outside of the monolith, I have this joke I tell which isn't very funny, which is like, DPs don't miss targets, they redefine the targets that they always hit. A lot of times technical migrations work the same way too, where you move something out, and you're like, actually, this is worse, but we really don't want to admit it, because then we'd have to migrate it back and that's really embarrassing. You just pretend it's working, and you don't measure anything. You don't actually check if it's better. You're just like, "Mission accomplished, we did a great job. We migrated out." Then you do a tech talk, and then you convince literally the entire industry to spend a decade doing this until we all realize collectively in 2022, that this is probably a waste of a literal decade that was spent working on this stuff. If you don't have a way to actually evaluate, how would you know? You can just decide for this project, you're going to have a way to evaluate it. You can say, finally, again, controversial, but we're only going to move one thing out, and nothing else is allowed to move to services until we have a clear point of view. Moving more things out doesn't help you get a better signal. It means if it's actually a bad decision, you're trapped. It doesn't matter, you're not actually evaluating decisions at all. Context is essential. These guiding policies, these approaches could be really good for your company. Also, they could not be true. Maybe you're not willing to staff even two engineers, because you're having a really hard time financially as a business. That's ok. What's important is acknowledging the reality around you.

It's Useful (Increased Dev Velocity, Decreased Friction)

Then you could say, so we make really boring, obvious decisions. Great, already do that. Pretty comfortable doing that. Is this actually a useful idea? Is there anything useful to this concept of strategy that I've described? I want to talk through three different kinds of example strategies that I ran into in my work so far, starting with one from Stripe, which is, we run a Ruby monolith. I think this is a core strategy that I saw there in 2016 era when I first got started. Start with the diagnosis. Stripe has an interesting challenge that they have so many external forces that they don't have too much control over. They have regulators in every single country they go into. At that time, they were in, I think, 20, 22 countries, and were expanding to more countries every quarter. Those regulators just change stuff. That's what regulators do. I don't know what they're there for, if they don't change things occasionally. That's their job. Then in financial tech, you have to respond to those regulatory changes. They also have banking partners that change things. They have enterprise customers that change things. There were a ton of things changing around them, and Stripe couldn't control that. Stripe had to plan in a world where the majority of the roadmap was dictated by external forces they had no influence over. I felt like, fintech's rough because fintech is literally built on thousands of financial institutions that are mostly just humans pretending to be software. Like an SFTP interface hack on top of it, or an API that's actually just like an SFTP they transfer underneath the API that they build to look a little bit more modern, and there's tons of bugs. You think about building beautiful, reliant, elegant software, and then you work in a system that's mostly just like a bunch of humans bundled up as an API. It's like really hard to build good software this way. Then, finally, a really complicated platform for money movement, again, like an API on top of thousands of financial institutions, which are really just like a bunch of humans shoved together in a trench coat. Then they had really complicated products on top of that as well, like Stripe Connect, which is basically how billing works for like Lyft, for example, in terms of cutting money movement. Stripe thought about, how do we deal with these pretty challenging diagnoses where we almost can't plan because external forces control so much of our roadmap? The approach, we need our entire risk budget just to deal with external forces. We can't take a single bit of risk to deal with changing databases, to changing programming languages. There's literally no tolerance. We're already negative on our available risk budget just from these external changes. We reduce technology risk by running a Ruby monolith. We have expert engineers who work in primarily one technology that work in primarily one code base. That's all we do. We don't support anything else, because this is how we think we can manage risk the best. Also, it means we've narrowed the technology landscape so much that we can make pretty significant investments into the Ruby stack, the monolith we're working on, because we don't have a wide variety of things we need to support. Then exceptions to this are narrow and rare. Like any company, like Stripe data engineering, there were some Scala there, the PCI compliant tokenization environment, that was in Go. Ninety-nine percent of the code was Ruby in a monolith.

Was this strategy useful? My argument's yes. Innovation budget. First, it meant that we got to actually control the innovation budget and shift it into product. Stripe wasn't a company that was trying to build the best new infrastructure for general software computing. Stripe was trying to build the best financial infrastructure it possibly could, and we wanted to spend our innovation budget on building the better product. It means we got to skip this detour into services that was a bit lucky. In that we got to have such a narrow landscape, we could actually make investments like Sorbet. Sorbet was the static Ruby typing project that we did at Ruby, ran I think about 8 seconds to do the entire monolith code base for static typing. This is the sort of investment we could have never done if we had three programming languages we're trying to support with wide scale adoption. This was this thing where we could only get the number of engineers to build something like this, for one single programming language, because of the size we're at, at that time. I think it was about 150 engineers at that point. This was only made possible because of the strategy we were operating with.

Flipping to a second one. Calm, we're a product engineering company. Again, the diagnosis is not necessarily about something you're proud about, or something you're not proud about. It's just like a statement of reality. When I joined Calm, 2020, we spent a lot of time arguing about new technologies. We spent more time arguing about new technologies than really doing anything else, which wasn't a super exciting moment to be in. Inevitable, Calm doesn't sound calm, joke accepted. It felt like we were adopting technologies for the wrong reasons. It felt like we were adopting technologies because people wanted to try them, that there was the sense that like, the goal of your job was provide enrichment and entertainment for you. That sounds pretty cool, but I'm not sure if that's exactly what we wanted. That was just where we were at that moment. We were one year into a service migration and we had not a single line of product code moved in. We had a bunch of platform infrastructure stuff. The stated policy was that we were moving to a service migration and a new programming language, and nothing had moved in a year plus. Our infrastructure team in Calm, about 25 engineers at that point, so our infrastructure team was like 3 people, were stuck supporting this monolith, vast majority of our product code, and a wide tail of random stuff we built for this service migration.

Looking at that, what did we decide to do? We came up with this idea, like we're a product engineering company, where we're going to invest, where we're going to differentiate ourselves is not in innovative infrastructure. It's by building the best possible product we could for our users, and delight on the frontends, and the mobile applications. We only adopt technology to support the product and our users directly. We don't introduce new technologies for any other reason. We write code in the monolith borrowing a really powerful functional requirement, by which I mean, typically our servers were maybe doing like 10 to 20 requests per second, if you needed to do 10,000 requests per second per server, sure, let's talk about doing outside of the monolith. That's an interesting requirement. If you just want to do something in the product code, you're doing 10 to 20 requests per second, probably do it in the monolith, or if you're writing the iOS app, probably not in the TypeScript monolith, but small constraint. Exceptions were only approved by the CTO and in writing. I think this in writing concept's interesting. I think even once you have a strategy, if you aren't clear about decisions, people will pretend you've approved things all the time. You have to make sure you actually are explicit about how people can know that exceptions have been granted.

What was the impact? I think some people were pissed. Like, we had a pretty good thing going, and now you're telling us we can't adopt new technology, but we should. How do we learn? How do we get better as engineers, if we don't get to adopt new technology in production that work? That was an interesting question. I was like, that's a super valid point of view but that's not what we're doing here. Again, strategy isn't necessarily about making everyone happy. Strategy is about articulating like how you are going to move forward as a company. Then, people can choose how they want to react to that. That's ok. People want different things. That's beautiful for them. You can't operate as a company this way if everyone's moving in different directions. You're just telling everyone, you have to deal with conflict all the time, if you don't make decisions like this. Some people felt pretty upset about it. For me, and I think for a lot of folks, it felt like we got to stop arguing, and we got to focus on what really mattered to us, which was the mission. We argued a lot less. We got to consolidate all of our tooling, instead of this wide span of the monolith and the services. We got to consolidate just a simple TypeScript monolith. We spent our innovation budget on things like ML powered recommendations for content based on like what the sleep story is, and the meditations that had resonated the best with a given individual. A few people left. A few people hated this strategy so much. They're like, "I don't even want to work with you anymore. I don't want to be part of this company." That's ok. Again, the point of strategy is to be clear about how you want to navigate the realities out there, so you can move forward together. If some people don't like it, that's ok. They weren't bad people. They're good engineers.

They're good people, people that I still think of quite highly, but they just didn't want to be part of the strategy we had. That's ok. The beauty of strategy is like, how do you move forward together? It's not making everyone happy.

Third and final example. At Uber, we run our own hardware. This is a 2014 date. First, like Uber expanding very rapidly geographically, into almost it felt like a new country every week. They had a city model where they expanded city by city rather than country by country, so really just adding cities constantly everywhere. Many of those geographies didn't have a meaningful cloud presence in 2014. In 2014, like China, for example, AWS was just starting to spin up. It was a little bit rocky and early there. It wasn't a well-developed, seamless cloud experience yet. The capacity was pretty low. If you think about going to Thailand or something like that, like Thailand did not have much of an AWS presence or whatnot. It wasn't the moment it is today, there was a lot less cloud availability then. Uber operated at a scale tens of thousands of servers, where the typical back of the napkin math for how much you can save by self-hosting versus doing on the cloud is like 20% to 30%. For a small number of servers, the fixed cost of the hiring team to support this, just doesn't make sense. If you have a couple hundred servers, there's no way to make the math work. If you have 10s, 20s, 30s, 40s, 50s, 60s, 1000 servers, you can make the math work. That's actually quite a bit of hardware. You're replacing the hardware every three years. It adds up. You can actually make that case at a company that was operating at that scale. Finally, Uber was a relatively rare company, which was willing to tolerate the pain of not having a cloud infrastructure. Think about like Kinesis, you think about SQS, you think about all these tools. Then, Uber was willing to tolerate the pain of not having access to any of those because they thought these other pieces could be valuable. The approach, Uber decided to run exclusively in their own dedicated colo space using their own hardware. Decided not to store any data, or any compute in any cloud. They did very narrow slice of networking, essentially using the cloud points of presence to reduce the number of POPs that we had to pull together ourselves. Any cloud integration beyond that had to go through CTO for approval, which was never going to happen.

What was the impact here? I think the impact here was it meant that we're able to operate in areas with shifting geopolitical data locality regulation, in a way that someone dependent on cloud was not. Lyft, for example, really never left the United States. One of the reasons that we were able to at Uber was the ability to run our own infrastructure, meant that you were always able to spin up two data centers in China in about six months, which was probably the worst six months of my life, but still, like very impressive technical accomplishment. We did it without colocating any data from anywhere else within it. We wouldn't have been able to get the commit from the cloud, for example. I think again, the cloud is other people's computers is true at the scale of 10s, or 20s, or 30,000 servers in a way that it's not in 10 or 20 servers. There's simply no way to accomplish this at the timeframe that we set. It meant that we had a ton of Not Invented Here stuff. It's no surprise to me that there are so many ex-Uber infrastructure startups out there, because Uber literally built everything from scratch because there was no cloud availability for us. That's ok. It was something we were willing to tolerate. Again, to the last point, like a lot of people who joined Uber were like, there's a lot of Not Invented Here going on. They were right. Again, the point of strategy is not to make everyone happy. The point of strategy is to make explicit tradeoffs, and live with the consequences of those tradeoffs. If you're not making tradeoffs, you're not doing strategy. It's not about making people happy. It's about finding a path forward that makes someone happy, hopefully.

Why these strategies work. The first, and I think this is the most important principle of strategy, it's like many interesting properties for your software are only accessible if you apply an approach consistently. I think Uber's point on running in our own data centers, if you run half your compute in data centers and half of it on the cloud, it's really hard to actually take advantage of the ability to do data locality, using the dedicated colos to manage where data shifts lives. Really hard if a lot of your compute is on the cloud. You can make it possible theoretically, but you don't get the availability advantages. You don't get the data regulatory advantages. You don't get a lot of it. I think actually, many things are not possible at most companies, because you don't make enough tradeoffs. I think this is the most interesting property here. Two, it led us to concentrate tooling investment. You think about companies where you join, you go through onboarding, and your experience is phenomenal. Almost all of them have really narrow standard stacks, golden paths. This is only possible with clear strategy holding people to a consistent approach. Stripe had one of the best experiences onboarding that I've ever had. Really, I think came down to having the Ruby monolith. We spent less time arguing about stuff. There are some arguments that are worth having. There's a lot of arguments that are just like people disagree. That's fine, but let's not talk about it anymore. With clear strategy where you actually tell people what you want, you can get to the bottom of it and just move on to arguing about whatever's next in this game.

Controlling your innovation budget. One of my personal beliefs is most companies don't actually own their innovation budget, like most companies have an innovation budget by teams doing random stuff without telling anyone, and then they spend their innovation budget supporting these emergent things. That's not necessarily bad. A lot of times, creativity hack comes from teams doing things that you would never approve, if you think about it holistically. It's not a terrible thing. I think we could probably as an industry benefit a lot from being able to pick where we spend our innovation budget a little bit more deliberately, rather than haphazardly, and in the shadows. Then another one is like, I'm sure a lot of you have worked with new hires who are really smart, really capable, and really unwilling to understand how the existing strategy worked at your company. These folks, I think, can be very disruptive to teams around them, but also disruptive to themselves. They typically push hard, don't accomplish a lot. Get annoyed with you, you get annoyed with them, and then they leave. They could have spent that 6 months, that 12 months, that couple years doing something really valuable for themselves and those around them. I think sometimes we villainize new hires who come in and make a mess. I think we're doing them a disservice, and we're doing them a disservice if not giving them clarity about how things actually work and how we want them to work. Then with clear strategy we can.

The counter case is also easy to make. You can think of a lot of things that went really badly without strategy. I'll just give you a couple quick ones. When I joined Digg, Digg was three years into a migration to a total rewrite, using Cassandra instead of MySQL, using Python instead of PHP, literally everything was brand new. It never worked. We went bankrupt, and the company failed. It wasn't a super good approach, you could argue, since that culminated in us going bankrupt and failing. The diagnosis was actually right. Contributing to the old code base, incredibly challenging at that point. It was like a decade of PHP, not super well maintained. People couldn't add new features anymore. Diagnosis is right, we just picked the wrong approach within our strategy. Stripe tried to introduce Java. I think Java is a really capable programming language. We use Java at Carta a bit. A lot of positive things to say about it. The first programming language I learned in college as well, some time back. We just didn't quite have a clear diagnosis that actually backed into our decision. I think it was a reasonable approach at many companies the decision to move to Java. It's something that I think would make a ton of sense. When we think about the challenges that we're confronting at Stripe, in terms of just the sheer unpredictability and chaos of the external forces they had to navigate, Java didn't really do a whole lot. It's like regulators weren't popping up at our door and like, you're using Java instead of Ruby, so you're approved. No one cared. It didn't really help with the problems we had. It could have helped a lot of companies with the problems they had, it just wasn't quite applicable to what we had on our roadmap. Then, finally, I think when I was at Uber, we had an interesting long running debate between two teams, different routing technologies, one was HAProxy, our good old faithful friend, which was a little bit frustrating to operate. Another was brand new. Both of these teams had really reasonable reasons that problems they're trying to solve, they had good diagnosis. They're also reasonable approaches, they're both trying to solve things in a super reasonable way. The company lacked a strategy to actually figure out, how do we have this debate? How did we actually resolve this conflict across these teams, both of which were making very reasonable decisions, but had no way to pull them together.

Strategy Is Everywhere, Although Rarely Written

Next, where is strategy if I'm making these examples? I think it's an interesting question like, have you ever worked at a company that has engineering strategy? Often, people tell me no, but I think the answer is like you absolutely have. All three of the examples I talked about, Stripe, Uber, Calm, when I joined, none of them had any of this written down. For a couple of them, when I left, none of them had anything written down. They still had really clear strategies that were doing a lot of work for them, they just weren't written down anywhere. I think almost every company, even if you're writing an angry review for your manager tonight in performance cycle, saying there's no strategy, I think you probably do have a strategy. I think this is a much more interesting question like, have you actually seen a written strategy that does any of the things that we're describing here? I certainly haven't joined a company that does. At Calm, we were able to pull one together. When I joined Carta, they did actually have a strategy process already coming together into a document shaped like this, that I think is quite good. There are very few companies that actually have these things written down. That's probably a problem. Because I think when we talk about this engineering strategy crisis, when we talk about the fact that we keep making poor decisions at a company level, and aren't clear how to get strategy guidance from leadership, I think the first half of this is just like, if we just wrote this stuff down, our life would be a lot easier.

Written Strategy is Much More Effective

Written strategies, can I actually make it easier to find for new hires? Instead of coming into a company and wondering why everyone's upset with you and your suggestions for six months, just read the document that's going to tell you why people will dislike your ideas initially. That'd be helpful, set you up for success, set them up for success. It's a lot easier to get feedback. I think one of the challenges with implicit strategies is that people have subtly different understandings of what the actual strategy is. For example, on the Stripe, really monolith idea, the tokenization environment was in Go, why? Why was it in Go? Why wasn't it in Ruby? People had reasoned backwards into a number of different explanations of why that decision was made. There wasn't a written version of it where we could actually say, here's why it was made, so everyone could agree. Then argue about how the policy should be changed. If it's not written down, it's really hard to create feedback and improve. It's also hard to explain why you made changes. I think because companies change so frequently, the most interesting thing to understand the company and the archive of decisions that have been made, is understanding why the strategy has changed. I think any well-run company is going to change its strategy repeatedly over time, and be able to see that the snapshots of reasons is really valuable to understand. Otherwise, you get like mythology. Usually, mythology has strong individuals. It's like we hired a new VP who hates Ruby and migrated us to Java or something. These are the often fictional and not that helpful to understand. They just kind of create a villain or a hero or whatnot. Sure, myths are fun, but they aren't actually going to help us improve the quality of our decision making there. Also, sometimes there's a strategy that you think is pretty clear that people are consistently confused about, and again, a written strategy way easier to help with there. The final thought I give on this one is that you simply can't hold people accountable to implicit strategy. One of the most destructive forces in our industry is well-meaning new hires coming to companies not having clear diagnosis and pushing an impractical approach. Again, it's easy to villainize them, like they should be more thoughtful. They should understand the context better. We could just tell them and then hold them accountable for this. Again, it's easy to find the villain in these things, but often, there are many villains, and it's almost always we are a part of the villains.

Advancing Strategy at a Company

Then, the last piece, I think, the most interesting piece is like, how can all of you, not as like CTOs, or like managers, or the architect of the company, push strategy forward at your company? How can we all walk out of here and make our companies a little bit better run? The first thing to accept when you want to improve strategy is that, if it's not working one of two things is wrong. Either the diagnosis is wrong, like sometimes wishful thinking. Sometimes you just haven't talked to enough people to understand. Or the approach just doesn't make sense. Sometimes the approach is wrong, because it's literally incoherent. Often, it's wrong because it's just like missassessed what people are willing to enforce. I think of a lot of times, for example, like you must write in the monolith. That's a great approach to the extent that you're willing to enforce it. It's not a good approach to the extent that you're not. People are just unrealistic sometimes about what is actually going to be enforceable at their company.

The two different strategies I want to talk about, first, it's like borrowing authority. I think obvious, but I think there's a few ideas to make it work. Then, second, when you simply can't make progress any other way, writing things down and getting to the right document is actually remarkably effective. First, borrowing authority. Again, there's basically two different pieces of this. Usually, when top-down authority needs to work on a strategy, the CTO, the VP, like whoever the responsible person is, has a really strong opinion that's wrong about the current state of play, and the challenge for executives for taking on larger companies. Even in small companies, you have the founding CTO that know the code base better than anyone else. This is actually why they can't have a good perspective on how to make a strategy because they know the code base too well. It's like, of course, you SSH into that machine and you restart that process at 12:02 every Thursday. It makes sense to them. They've been restarting this thing every Tuesday at 12:02 a.m. for the last 6 years, but it doesn't make sense to anyone who joins the company subsequently. Or it could be the flip side, which is like the non-founding CTO who is operating at a slightly higher level. What you can do there is like pulling in better context.

Then, two, the approach. Even if you have a great diagnosis, like people assume enforceability about things that CTOs or leadership will never enforce. How do we get into these a little bit? I'm going to give you the simplest roadmap of how to do this, and then I'll talk about the hard stuff. First, literally just start by writing a diagnosis. It doesn't have to be very good. There's this idea of the blank page problem, where if you write a diagnosis that's bad, people will come get angry at you and tell you what the right diagnosis is. If you don't write anything, often, no one is willing to write the first word. The best way to get a great diagnosis for your organization is to write down something pretty mediocre. Then they'll be pretty upset, and people will come out of the woodwork to get angry with you. That's great. I think when people are frustrated, like friction, when they disagree with you, that's actually a sign of communication going well. You're actually talking about something real in many of these cases. Write the diagnosis. Rile up the company. Get some feedback. It doesn't always have to be quite that messy. It can also be less messy. That's ok too. Second, draft the approaches. Again, very literal, kind of next step. Get validation again. Find the people who are upset and try to figure out why they're upset. Share it widely as a draft. Use the CTO to enforce it. Iterate periodically. This is the most idealistic version of borrowing authority ever. You're like, that sounds utterly delusional, knowing this will never work. Let's talk about that.

The number one reason that I see people try to do what I just described, and have it fail, is they don't write the strategy that the executive wants. They write the strategy they want the executive to want, and then they're confused why the executive doesn't want it. Let's dig into this one. Oftentimes, I see this conflict where someone's trying to write a strategy, and the CTO or the head of product or whoever the key stakeholders are, keeps disagreeing with them. They're writing the strategy and they're like, my goal here is to convince the executives or to convince the chief architect to agree with me. Your goal when you're writing the strategy is not to convince anyone to agree with you. Your goal is to understand why they disagree with you, and to keep mining that idea. Something I found true in my career is like, usually there's not like this way or that way. There's like, if you understand deeply the perspective of the person you're talking to, there's a unified view that solves what you care about and what they care about. We can only do that when you get to the bottom of it. I think one of the realities of working with senior stakeholders is like, they're often pretty busy, and they often don't have enough context on the ground to do that unification. You have to be the one to come to them and try to figure out why they're disagreeing with you and how to find the alignment between what they care about, which is probably narrow and specific, and what you care about, which is probably narrow and specific. Find the unifying lens. You got to be reliably curious. Again, often I've had people come up to me, and they're like, I want to lead this strategy. I know this person, when they talk to others, consistently frustrates them, or consistently comes from a pure frontend perspective and frustrates the infrastructure engineers, or a pure SRE perspective, and the product engineers can't align with them on anything, or a pure security perspective. If you want to lead a strategy, if you want to actually try to get your company aligned on a path forward, you have to be known as someone who's reliably curious. Where you think, we should move to a Ruby monolith, maybe not in 2023, and someone disagrees with you. Instead of trying to argue with them, again, try to understand their point of view. Why do they feel so strongly that you're wrong? Dig into that.

Be pragmatic rather than dogmatic. A lot of strategies want to be dogmatic because it's like powerful. It's the thing you brag about at a tech conference. It makes you feel like you've accomplished something. Most things you're actually going to get enforcement on are just super practical. Dogma is just not that valuable in terms of making real tradeoffs that people will actually follow. Building buy-in is the work. I think another version of this is that someone will write a strategy document. People don't like it, and they bring it to me, and they're like, "I'm done. Now you just need to get everyone aligned behind it." I'm like, that's the entire job. Writing a strategy document that I like is not that hard. It's very easy for me to write a strategy document that I like. Same for you, you can write a strategy that you like. The hard part is actually going and figuring out how to get people aligned with it. How do you make sure security, and SRE, and frontend, and mobile, and product, and customer success are willing to actually buy into it, and understanding the tradeoffs? If you don't do that, if you just write the document you like, you haven't really done anything. I can do that already.

Then, finally, this is my number one organizational hack. I talk about everything as an experiment. Like, we're going to try this for three months, and then review how it goes. It really helps people not have this like do or die moment, even if you end up just finalizing almost every experiment. It avoids this kind of, "If I don't stop this terrible strategy document from going out," like all those doomed responses that I think a lot of people tend to have. My CTO will never let me run the strategy process. Seven out of 10 times, I promise you, you're simply not writing the strategy they want, and you will never get them to approve a strategy they don't want. The other 3 out of 10 times, though, there's some real conflict the CTO is not dealing with for some reason. Maybe there's two different product teams, both of whom they need revenue from to actually ship this quarter, who hate each other and they can't agree on a technology path forward. Sometimes they're just not going to make a choice. What do you do, then? What do you do when you've actually done the work, you've gotten the input, and the CTO is unwilling to actually pick a clear path forward on some of these things? That could be a good strategy. Like our strategy is, we're unopinionated on this topic. It's not, again, like not something to brag about, but it's useful, it's clear. It means both of these teams can keep doing what they're doing.

Then just figure out how to write the document down. I call this the write 5 then synthesize model. It's, again, pretty straightforward. Write five design documents. You might think about, how do we make this decision about what frontend architecture to use for this new product we're building? Take those five design documents that you write for frontend and turn them into a narrow strategy, for example, like a frontend engineering strategy. All this does is it describes the diagnosis for frontend engineering at your company. How do you make choices? How do you decide to use React or not React? How do you have a design system? What is the implementation of that design system? How do you incorporate updates to that design system? Stuff like that. You and a peer group, like other senior engineer, staff engineers at the company do this five times, and maybe you have a narrow data engineering strategy. Maybe you have a narrow frontend strategy, data science strategy, product engineering team A and B strategies, whatnot. Then you shove them together, synthesize that into a broad strategy. What you've done, again, feels a little bit anticlimactic. What you wanted to do was to say, like I wrote the novel engineering strategy and got to brag about it on LinkedIn. Instead, you spent like a year writing 25 design documents, but they actually describe how things work today. You don't have to worry about enforcement, or whether it's like, is this a real diagnosis? You're like a historian now, you've just gone and documented how decisions have worked at the company for the last year, 18 months, whatnot. It can't be wrong, no one can argue with you. This is factual. This is the diagnosis, and this is how we make decisions. At that point, you have a written thing that when you think back to the advantages of having a written strategy, something you disagree with. If that written strategy that you've created here is really embarrassing, if it's like we avoid conflict, and we don't resolve disagreements, that's embarrassing as a policy. A lot of companies, that's literally how they make decisions. Writing that down, embarrassing people is part of the job sometimes to get things done. That's valuable. You've made that possible by writing it down. If people disagree with what you've written down, even better. That's the entire goal. That's the point of this, is you want to facilitate these conversations in a level of detail that simply isn't possible with implicit or inconsistent use of how the company makes decisions.

Recap

We talked a little bit about engineering strategy, just two things: honest diagnosis, practical approach. Talked through some examples at Stripe, Calm, Uber, where I've seen strategy really clarify how we make approaches. Then the twisted bit is like, none of those companies wrote those strategies down at all initially, although Calm did, in terms of our revised strategy there. It's a lot easier if companies actually do write these things down. Then we talked about a couple ways you can actually do this at your company. Write it down. Find some way to get this document out, where it's not just what you want, instead, it's a historical lens on what's actually happening across the company. We can't argue with a diagnosis if it's simply factual events that have happened over the last year. Borrow authority when you can.

 

See more presentations with transcripts

 

Recorded at:

May 15, 2024

BT