In my earlier blog post on incremental design I suggested that we need to allow for failure. So how do we limit the impact of failure, how do we tell when our design choices don't work and how do we organize our world in a way that allows us to build upon what we learn from our experiments?
Perhaps it may surprise you that I begin this post on design with a section on team structures...
Team Structure and Architecture
Dev teams work best in small groups. Team sizes of up to about 8 people seem optimal. So organizing development groups into many small, 8 or less person teams is a common and effective strategy. Conway's law tells us that this has implications on the design of our systems. If we want our teams to be autonomous, self-organizing and self-directing, but comprise of fewer than 8 people, then how we can allocate work to these teams is seriously constrained. The problem is that the other really important thing to enable the continuous flow of valuable software is that we want these teams to each deliver end-to-end user value without having to wait for another team to complete their work - We want to eliminate cross-team dependencies for any given feature.
I think that this is one of the strengths of microservice architectures that people are currently excited about. Different flavors of this idea have been around for a long time, but implementing a system as a collection of loosely-coupled, independent services is a pretty good approach. This approach is also reflected in the team structure of companies like Amazon who structure their entire development effort as small independent, (2 pizza) teams, each responsible for a specific function of the business.
A good mental model for this kind of architecture is to imagine an organization without any computers. Each department has its own office and is wholly responsible for the parts of the business that it looks after and decides for itself how best to accomplish that. Within each office they can work however they like. That is wholly their decision and their responsibility. Communications between departments is in the form of memos. Now, replace the office with a service and the memos with asynchronous messaging and you have a microservice architecture.
A great tool for helping to design such an organization, and so encourage this kind of architecture is the idea of Bounded Contexts, from Eric Evans' book "Domain Driven Design". A Bounded Context is the scope within which a domain model applies. Any complex problem domain will be composed of various Bounded Contexts, within which a domain model will have a consistent meaning. This idea effectively gives you a coherent scope within your problem domain that sensibly maps to business value.
Bounded Contexts are a great tool to help you organize your development efforts. Look for the Bounded Contexts in your problem domain and use that model to organize your teams. This is rarely a team per-service, or even per-context, but it is a powerful approach to looking at how to group contexts and services to allocate to teams, so ensuring that teams can autonomously deliver end-to-end value to the business.
WHITE PAPERCheck out this white paper to learn the 11 DevOps “black holes” you can easily get sucked into… and how to avoid them!
A common concern of people new to iterative design is the loss of a "big-picture". I learned a trick years ago for maintaining a big-picture, architectural view. I like to maintain what I call a "Whiteboard Model" of the system that I am working on. The nature of the diagram doesn't matter much, but its existence does. The "Whiteboard Model" is a high-level abstract picture of the organization of the system. It is high-level enough, that any of the senior members of the team should be able to recreate it, on a whiteboard, from memory, in a few minutes. That puts a limit on the level of detail that it can contain.
More detailed description of the model in a CD context should exist as automated tests. These tests should assert anything and everything important about your system. That the system performs correctly from a functional perspective. That it exhibits the desired performance characteristics. That it is secure, scalable, resilient, conforms to your architectural constraints - whatever matters in your application. These are our executable specifications of the behavior of the system.
In a big system, the Whiteboard Model probably identifies the principle services that collaborate to form the system. If it is too detailed it won't serve its purpose. This model needs to be in the heads of the development team. It needs to be a working tool, not an academic exercise in system design.
I also like creating little throwaway models for tricky parts of the system that I am working on. These can be simple notes on bits of scrap paper that last a few minutes to explore ideas with your pair, or they can be longer-lived models that can help you to organize a collection of stories that over-time build up a more complex set of behaviors. I once worked as part of a team who designed and built a complex, high-performance, reliable messaging system using these incremental techniques. We created a model of the scenarios that we thought that our reliable messaging system would need to cope with. It took several of us several hours around a whiteboard to come up with the first version. It looked a bit like a couple of robots, so it was ever after referred to as the "Dancing Robot Model". Often in stand-ups you would hear comments like "I am working on Robot 1's left leg today" ;-)
Over the years I think that I have seen two common failure modes with modeling. The first is modeling. For me trying to express the detail of an algorithm, all of the methods or attributes of a class is a waste of time and effort. Code is a much more efficient, much more readable model of structure at this level of detail. Models should be high-level abstractions of ideas that inform decision making, but striving for detail or some level of formal completeness is a big mistake.
The second failure is lack of modeling. I think that the widespread adoption of agile approaches to development have made this failure mode even more common. I often interview developers and commonly ask them to describe a system that they have worked on. It is surprising the number of them who have no obvious organizing principles, certainly nothing like a whiteboard model.
There is a third failure, but it is so heinous that I shudder to mention it - lack of tests! Of course, once our models exist in code, they should be validated by automated tests!
The Quality of Design
Personally I believe that an iterative approach is a more natural approach to problem solving. The key is to think about the qualities of good design. Good designs have clean separation of concerns, are modular, abstract the problem and are loosely-coupled. All of these properties are distinct from the technology. This is just as true of the design of a database schema or an ANT script as it is of code written in an OO or Functional language. These things matter more than the technology, they are more fundamental.
One of the reasons why these things are so important is that they allow us to make mistakes. When you write code that has a clean separation of concerns it is easier to change your mind when you learn something new. If the code that places orders or manages accounts is confused with the code that stores them in a database, communicates them across a network or presents them in a user interface then it will be tough to change things when your assumptions change.
I once worked on a project to implement a high-performance financial exchange. We had a pretty clean codebase, with very good separation of concerns. In my time there we changed the messaging system by which services communicated multiple times, growing it from a trivial first version that sent XML over HTTP evolving it to a world-class, reliable, asynchronous binary messaging system. We started with the XML over HTTP not because we thought it would ever suffice, but because I already had some code that did this from a pet project of mine. It was poor in terms of performance, but the separation of concerns was what we needed. Transport, messaging, persistence and logic were all independent and pluggable. So we could start with something simple and ready to go to get the bare-bones of our services in place and communicating. We then evolved the messaging system as we needed to, replacing the transport at one point, the message-protocol at another and so on.
The messaging wasn't the only thing that evolved, every aspect of the system evolved over time. At any given point in its life doing just enough to fulfill all its requirements. You can't make this kind of evolutionary change without good quality automated testing, but when you have that insurance, any aspect of the system is amenable to change. As well as messaging, we replaced our relational database vendor, and we changed the technology that rendered our UI several times. We even made a dramatic change in the presentation of our system's UI moving from a tabular, data-entry style user-interface to a graph-based, point and click trading interface at one point. All this while keeping to our regular release schedule, releasing other new features in parallel and, vitally, keeping all the tests running and passing.
Worrying about good separation of concerns, modularity, isolation is important.
The secret to incremental design... it is simply good design!