How to Design Teams, Unbiased

Nilendu Misra
5 min readOct 31, 2023

--

You ship your org chart. Therefore, to fix shipping, fix your org first. This is often called the “Inverse Conway maneuver”. Or, more verbosely, something like — “developing software is a socio-technical complex problem. We often naively focus on the technical part and only reactively look into the org part much later in the process. This should be turned around. Start with building the right org. In fact, the right org should be the very FIRST deliverable of a system architecture.” All good, except there is no existing framework, empirical literature or large-scale study of “technical org design”. That results in biased, reactive and unnecessarily nested, if not random, engineering orgs. To try to produce what we ought to, we lean into what we had in the past. That primarily results in huge communication overhead.

This book methodically shows how to think and act on org design. It lays out four primary org types value stream- (say, feature teams), Enabling-, Complicated Subsystem- (say, databases or network), and Platform teams. It also shares three interaction modes these teams could work with each other — Collaboration, X-as-a-service and Facilitating. The 4x3 framework itself is a very powerful outcome from this. It also offers deep insights into composing, evolving and improving teams, and therefore the outcome of engineering orgs.

Some key takeaways -

— Treat people & technology as a single carbon/silicon system.

— Hallmark of good org design is where the communication pathways converge with the org chart. Org-charts are top-down constructs, most human communication at the workplace is “lateral” — with their peers. Pay attention to this when taking an org chart driven decision,

— When a team’s “cognitive capacity” is exceeded, the team becomes a delivery bottleneck.

— Conway’s big idea really was this question — “Is there a better design that is not available to us because of our org?” An org has a better chance of success if it is reflectively designed.

— Orgs arranged in “functional silos” (e.g., QA, DBA, Security) is unlikely to produce well architected end-to-end flow.

— “Team assignments are the first draft of the architecture”.

— “Real gains in performance can often be achieved by adopting designs that adhere to a dis-aggregated model”.

High performing teams, by definition, are long lived. Regular reorg for “management reasons” should be a thing of the past.

Single team is typically upper bounded at a size of 15 people. That is the ceiling of people with whom “we can experience deep trust”.

Three types of cognitive load — Intrinsic, Extraneous and Germane. Germane is more domain specific — where value is added. Our goal therefore is to eliminate “extraneous cognitive load”, e.g., worry-free, reliable build with automated tests should be a promise fulfilled.

— Rather than choosing the architecture structure (e.g., monolith etc), therefore a leader’s objective function should be to manage the cognitive load of every team to achieve safe and rapid software delivery.

High trust team management is essentially “eyes on, hands off”.

SRE teams are not essential, they are optional. The single most important “business metric” for SREs is “error budget”.

Designing platform team(s) is one of the hardest decisions. Left to engineers alone, Platform will be overbuilt. A 2x2 framework — “Engineering Maturity” (High-Low) vs. “Org Size or Software Scale” (High-Low) is a good conceptual model to choose the right platform team model. e.g., for highly matured teams operating at a lower scale/size, individual teams could subsume platforms with right collaboration. Less matured orgs at a large size should lean toward the “Platform-as-a-service” model.

3 different dependencies between teams — Knowledge, Task and Resource. Right talent covers knowledge gap, Right team design covers the task- and Right processes and tools (e.g., DevEx, API versioning, self-provisioning of env etc) cover the resource gap.

— Only around 1-in-7 to 1-in-10 teams should be “non-stream aligned”, i.e., one of the other three types — Platform, Complicated Sub-system or Enabling teams.

— “When code doesn’t work…the problem starts in how teams are organized and how people interact.”

— While we intuitively think of monolith as the “codebase”, this mental model can be expanded. Monolithic Builds (when there is ONE giant CI, even with smaller services); Monolithic Releases (when different services still rely on one shared env for testing, say); Monolithic Standards (e.g., tight governance of tool/language/framework); Monolithic Workplace (e.g., Open-plan office).

— “Fracture Plane” (aka seam) is a good metaphorical framework to think about “how to divide teams”. Such planes can be thought of from “bounded context” (i.e., business domain); Regulatory Compliance (e.g., PCI DSS/Payments); Change Cadence (e.g., annual tax vs. accounting modules); Risk (e.g., money movement, loan vs. dashboards); Performance (e.g., highly sensitive components vs. others); Technology (e.g., Rust, Golang etc); Geography (e.g., offshore vs. HQ etc). Traditionally, and unfortunately, we divide teams more on technology than on other, often more powerfully valid, dimensions.

— “If you have microservices but you wait and do end-to-end testing of a combination of them before a release, you have a distributed monolith”.

Intermittent collaboration is better than constant interaction. Collaboration leads to innovation, so do not suppress all human-to-human collaboration in naturally aligned teams, but pay careful attention to whether communication cost far exceeds collaborative benefits. Collaboration tax is worth it if the org wants to innovate very rapidly. Increased collaboration != Increased communication.

Interaction modes between teams should be habits, i.e., a targeted outcome of org design. This is essentially what Jeff Bezos’ famous “Teams only interact with each other with API or please leave” memo does.

— Be alert for the white space between the roles, gaps that nobody feels responsible for.

— Biggest change in last decade — historically, “develop” and “operate” were two distinct serialized phases with one way arrow from develop to operate. The best orgs have a tight feedback loop from operating back to develop. Customers know the best! The “operate” phase emanates “bottom-up” signals from customers via logs, metrics and other data that developing teams must pay equal — if not more — attention to as it does from other “top-down” directives (say, from PMs). These are essentially “synthetic sense organs for the outside”.

Software is less of a “product for” and more of an “ongoing conversation with” users. This, to respond to users, MUST be an integral part of the development team’s responsibility. This should not be forked out to a separate, isolated “BAU or Service Maintenance team”.

Excellent book and wonderful, research and experience backed frameworks — must read if you are leading teams that are growing.

--

--

Nilendu Misra
Nilendu Misra

Written by Nilendu Misra

"We must be daring and search after Truth; even if we do not succeed in finding her, we shall at least be closer than we are at the present." - Galen, 200 AD

No responses yet