BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Introducing the RIG Model - the Puzzle of Designing Guaranteed Data-Consistent Microservice Systems

Introducing the RIG Model - the Puzzle of Designing Guaranteed Data-Consistent Microservice Systems

Key Takeaways

  • This article introduces the novel RIG model, which supports the design of guaranteed data-consistent microservices systems from a business perspective.
  • RIG is an abbreviation of Reversible, Irreversible, and Guaranteed, and it categorizes microservices behavior within a sequence of local transactions, also known as a saga.
  • The RIG model formulates three rules for a saga call chain, guaranteeing eventual data consistency.
  • The article proposes a gamified RIG tool. The tool consists of three main RIG pieces and can be used by teams to model a guaranteed data-consistent microservice system as a puzzle.
     

Introduction

Whenever a microservice architecture is proposed, you should ask this question:

Are we sure that our data will stay consistent?

You could easily ignore this question and only focus on the promises that microservices offer: "Microservices enable teams to work more independently, accelerate the time to market, develop new features, fix bugs, etc.," according to Sam Newman in Building Microservices.

And indeed, you will probably not get into trouble in the "happy scenario" cases. However, as we all know, the devil is in the details – what will happen when things start to go wrong? Well, you will probably hear, "We will use sagas with compensating transactions, so you are not to worry. The sagas will eventually reach consistency."

The question is – is it true? Will sagas, with compensating transactions, guarantee eventual consistency?

  • The short answer is "No."
  • The longer answer is "No – unless..."

It turns out that if the sagas comply with specific rules, you can guarantee eventual consistency.

This article is about clarifying these rules and about providing a tool for teams to design sagas that, with a guarantee, can lead to eventual consistency in the "not so happy scenarios" – or, if this is not possible, give the basis for choosing a solution where you know the consequences.

This article has a theoretical foundation in the CAP theorem initially described and later updated by Eric Brewer, and the work of Sam Newman, Chris Richardson, and Kaj Steen Bromose and Ronni Laursen. The transformation of the theoretical work into a tool for practitioners has been done primarily in collaboration with a major Danish fintech company, and it is currently being "test run" both within the company and in a college course on microservices. The initial tests show that the tool works as we had hoped. As with EventStorming, our tool gives a business and developers an effective setting for discussion. In our case, it is about data consistency. It turns out that problems with eventual consistency often must be resolved in collaboration with the business – either by changing the business’s processes or taking a calculated risk. The RIG tools facilitate this discussion in a nontechnical way, focusing on the business processes by prototyping sagas in a gamified way.

But first, let’s set the ground. According to Newman, if microservices are to deliver their benefits, they must be individually deployable. The consequences of designing for individual deployability will result in leaving a single database system with transactional ACID (Atomicity, Consistency, Isolation, and Durability), where the rollback function of ACID transactions handles situations where the business rules of one or more participating entities are violated. An individual deployable microservice architecture calls for one database per service without distributed transaction management.

Consequently, you lose the possibility of strict consistency based on transactions with ACID behavior. You are left with eventual consistency as your only option for achieving consistency in transactions spanning multiple microservices. This shift from strict to eventual data consistency is one of the major complications of the "one database per microservice" paradigm. A popular way of achieving eventual data consistency is using the Saga pattern. Richardson describes a saga as "a sequence of local transactions within each participating microservice." A saga’s goal is to obtain a systemwide eventual data consistency of a sequence of local transactions in a chain of loosely coupled microservice calls. That means all databases should eventually reach a consistent state representation of the data involved in the "transaction."

The Saga pattern can ensure eventual data consistency through compensating transactions, but not in all cases. As described by Bromose and Laursen in Ensuring Eventual Consistency in a Microservices Architecture, some constraints must be observed in the microservice call chain to guarantee eventual data consistency. In Microservice Patterns, Richardson points out that even if the constraints are fulfilled, using Saga patterns is no guarantee of data consistency, as interleaved sagas potentially compromise the data consistency due to the lack of the "Isolation" property provided by an ACID transaction, leading to issues like dirty read, lost update, etc.

For many business applications, the lack of a guarantee of eventual data consistency is critical. This leads to rejecting the microservice architecture as a possible solution, but this article introduces the novel RIG model and a gamified initial tool. RIG is an abbreviation of Reversible, Irreversible, and Guaranteed, and it categorizes microservice behavior within a saga scope (pdf). The RIG model tool allows for designing sagas for microservices systems where guaranteed eventual data consistency is needed. The RIG tool is inspired by the EventStorming method created by Alberto Brandolini and is meant to be a business-focused, interactive, fast-track tool to get the sagas right before going into implementation. 

The RIG model and tool

The RIG model

The RIG model sets the foundation for the saga design. It is founded in the CAP theorem and the work of Bromose and Laursen. The theoretical work results in a set of microservice categories and rules that the sagaS must comply with if we are to guarantee data consistency.

The RIG model divides microservices behavior within a saga into three categories:

  • Guaranteed microservices: Local transactions will always be successful. No business constraints will invalidate the transaction.
  • Reversible microservices: Local transactions can always be undone and successfully rolled back with the help of compensating transactions.
  • Irreversible microservices: Local transactions cannot be undone.

You must think of transactions from a business perspective (i.e., business logic constraints determining the category – not technical ones). From a technical perspective, it is assumed that the system is robust and resilient enough to preserve messages in case of technical problems, such as network failure, microservice crashes, etc. This is a nontrivial requirement, but it can be handled if you separate the concern into an external and an internal perspective. From an external perspective, one of the ways to help accomplish this resilience is using a message queue system with guaranteed delivery. Form an internal perspective you could use a transaction mechanism combined with the outbox pattern to ensure consistency within the microservice database and the outgoing messaging system used for publishing messages. Regarding incoming messages, a transactional read from the input queue could be used, where the messages is removed from the in-queue as part of the commit of the internal work of the microservice – e.g. the write to the application data and write to the outbox data and the removal of the message from the incoming queue is handled as "a unit of work" within the same transaction scope.

The tool

The RIG modeling tool is intended to provide development teams with a collaborative tool for prototyping and identifying areas of compromise in data consistency.

The tool consists of three pieces equaling a Reversible (R), Irreversible (I), and Guaranteed (G) microservice. The RIG pieces use traffic light colors: red for irreversible, yellow for reversible, and green for guaranteed microservice. The solid arrows indicate the intended business logic flow; the dash arrows represent rollback flow.

The pieces shown in Fig. 1 are in Sticky Note size. The top three are 3D printed in PLA with three layers. The middle layer is black and embeds a magnet to use the pieces on magnetic whiteboards. The bottom three are made in acrylic with a printed foil. The CAD file can be downloaded. The following subsection describes the individual pieces in more detail.

Fig. 1. RIG pieces.

Reversible

Reversible microservices have business rules that may invalidate the local transaction. For example, you may have a business rule that the number of an item in stock must be positive – that is we will not accept an order if we are out of stock. For example. If you try to place an order where we will go out of stock, the service will reject the stock update transaction and issue a back going compensating message with the reason "out of stock".

If a business rule invalidates the saga transaction, the microservice must roll back any ongoing local transaction and send a "cancel transaction" message.

A reversible microservice must include support for a compensating transaction and be able to handle an incoming "cancel transaction" message. When receiving a "cancel transaction" request, the microservice must "roll back" to the state before the saga.

Handling compensating transactions in a reversible microservice must behave as a "Guaranteed" service. This can be tricky; you may consider the "invoice/credit note" pattern to solve this problem. For example, when you change the stock value of an item, you could have an "item event" table where you write the amount you remove from stock (as a negative value) in the "item event" table. And if you need to do a compensating transaction, you write the same amount as a positive number in the "item event" table. The stock amount is then the sum of amounts in the "item event" table – much like the way your bank account is handled with an account table supported by an posting table.

The yellow reversible piece has triangular connectors, indicating that a rollback via a compensating transaction is possible. Handling compensating transactions in a reversible microservice must behave as a "Guaranteed" service, indicated by the green color of the backward flow arrow.

Irreversible

The irreversible microservice does not support compensating transactions. Once the local transaction in the microservice is done, it cannot be canceled or rolled back. However, it can issue a compensating request to the saga if it fails its local transaction due to violating a business rule in the microservice. That is, it is like the reversible category but lacks the support for handling "cancel transaction" messages. For example, if an ATM has handed out money, then you cannot go back. The irreversibility piece is red and is indicated with a round connector.

Guaranteed

In a guaranteed microservice, local transactions are always successful. In a guaranteed microservice, no business logic can invalidate the local transaction. Therefore, local transactions will eventually be performed, even in case of technical problems. Compensation transactions are supported and guaranteed. Both sides of a guaranteed piece are shown in Fig. 2. The piece is a connector-width longer and can be used on both sides. The side shown to the left has triangular connectors at the top, which should be used before an irreversible microservice, whereas the other side with round connectors is to be used afterward, and therefore does not have an arrow depicting a compensation transaction.

RIG rules and gamification

This section will describe the three RIG rules, starting with the two internal saga constraints followed by the external saga one.

Internal saga constraints

There are only two internal rules when using the RIG tool to obtain guaranteed consistency in a saga (a call chain):

  • Rule 1: Only one irreversible microservice is allowed.
  • Rule 2: Only guaranteed microservices can be used after an irreversible microservice.

The RIG pieces will only fit together if the saga results in data consistency, and an example is shown in Fig. 2.

Fig. 2. RIG data consistent flow

If the saga does not comply with the RIG rules and hence cannot guarantee consistency, the pieces will not fit together. In the example shown in Fig. 3, the saga has a reversible service after an irreversible one. This means that the saga cannot guarantee consistency, and this is visualized by the pieces that do not fit together.

Fig. 3. RIG data inconsistent flow

The examples show only a linear chain. However, parallel flows within the chain are possible if they comply with the RIG constraints. The "Fork"/"Merge" in the saga orchestration will handle this.

External saga constraints

  • Rule 3: Sagas must not be interleaved.

As described in the introduction, a saga does not have a transactional isolation mechanism like the isolation levels from ACID. This leads to well-known problems when interleaving transactions with a "read uncommitted" isolation level, such as dirty read, non-repeatable read, or phantom read.

To avoid these problems, saga must not interleave. Therefore, a saga must be isolated from other sagas. Suppose another saga changes the data of the current saga. In that case, eventual consistency cannot be guaranteed because the compensating transaction (rollback) may fail due to the changes done by the interleaving saga.

Bromose and Laursen describe ways to establish isolation of sagas, which this article will not further describe.

How will the use of The RIG Method Tool play out?

Imagine that you have been participating in an EventStorming session. You and your team have identified business boundaries and are now discussing which microservices the system should be split into. Taking one business event at a time, you and your team must ensure that the bespoke microservices architecture will remain data-consistent.

First, you and your team discuss whether the business event requires transactional behavior regarding the microservices involved. Is a saga needed? If so, you and your team discuss the category of each microservice in the saga – is it R, I, or G. Then you start to prototype the saga flow using the RIG pieces – maybe on a whiteboard – and you and your team collaborate on finding a flow where the pieces fit together. There are two possible outcomes:

The pieces fit together. Congratulations!  You have a data-consistent saga.

The pieces don’t fit together. This is a bad situation that must be resolved, and you need to refactor your microservices to make the pieces fit.

These are some suggestions to resolve this refactoring:

  • Merge some of the microservices into a modular micro monolith.
  • Change the business requirements for the business event to loosen the constraints for one or more microservice in the saga, thereby getting the pieces to fit.

At the end of the session, you have a saga where the pieces fit nicely together. 

During the process, you also get clarification about transactional requirements for the microservices – scoping one business event at a time.

  • Reversible microservices must implement a non-failing compensating transaction for the business event.
  • Guaranteed microservices must implement both a non-failing transaction and a non-failing compensation for the business event. 

The RIG system design flow model

The RIG tool that has just been described is a central part of the puzzle of designing guaranteed data-consistent microservice systems. But, it does not cover the process "from business to microservices." The RIG system design flow model gives you a framework around the RIG tool and guides you from business to a system. The system will not always be microservice-based, but in the case of a microservice system, the process will include the RIG tool.

So, what is "The RIG system design flow model?" It is a problem domain-focused process that helps you design microservices systems from a business perspective. The RIG system design flow model gives you a path from analyzing business requirements to an architecture sketch of the IT system supporting the business requirements. The focus point is data consistency and – from a business perspective—making well-founded risk decisions when accepting the possibility of not reaching eventual consistency. In short:

  • The RIG system design flow model is a decision support tool that helps you design microservices systems from a business perspective.
  • The RIG model is a design tool that allows you to validate that your saga flows can always reach eventual consistency.
  • The RIG model lives within the scope of the RIG system design flow model.

Figure 4 shows the RIG system design model as a flowchart. A flow could follow the following steps:

a)    Identify business boundaries and rules.

  • When all-important business boundaries and rules are identified, go to step b.
  • We recommend that you use EventStorming and the concept of Bounded Contexts from Domain-Driven Design in combination with the concepts of dark matter and dark energy in this step.

b)    Split into microservices.

  • Examine if sagas are involved in the flow. If yes, go to step c.

c)     Create a flow where the RIG pieces fit together.

  • A consistent system is obtained when the flow complies with the three RIG rules.

d)    Revisit business boundaries and rules.

e)    Handle interleaved sagas and establish isolation.

Fig. 4. Designing maintainable software systems – Problem domain flowchart

It turns out that when you break down your system into microservices, you should try to avoid the red irreversible services. They will damage your business agility when you start to model the sagas, the reason being that they significantly constrain your sagas – "there can be only one red piece on the table at a time," meaning that when you model a saga, the first rule of RIG kicks in and may make the work of designing the saga cumbersome. In general, green and yellow are fine, and red is bad. 

Dealing with a lack of eventual guaranteed eventual consistency

Sometimes, you need to ask yourself – do I really need guaranteed eventual consistency?

It may be a reasonable business decision to take a risk on a given transaction. Therefore, if you cannot comply with the RIG rules, you can use the CAP theorem to decide the CAP position of your saga based on the degree of risk you are willing to take. This is what we call "Dynamic CAP."

 Dynamic CAP

The idea of dynamic CAP is that you do not need to model your entire system within one CAP position – for example, the AP position, which is the most common position for a microservice system. You can model your system in a way where different parts have different CAP positions. You may even consider a policy-driven CAP positioning using a data-driven approach. For example, depending on the amount of money involved in the transaction, you dynamically shift your CAP position depending on risk willingness.

The modular monolith

If you do not want to take on any business risk, consider whether the microservice architecture is worth the trouble. It takes a lot of work to do microservices right, and it takes a lot of work to keep microservices running (due to the need to monitor the system). Therefore, if you don’t need microservices, you should probably look into "the new kid on the block" – the modular monolith.

Conclusion and future work

This article proposes the RIG model, which can assist teams in achieving guaranteed data consistency in microservice systems using the RIG pieces to visualize the sagas in the system. The RIG pieces enforce the first two RIG rules, which are implied when the pieces fit together. It also pinpoints that achieving guaranteed consistent data may require rethinking business processes, e.g., changing the order in which activities are executed. Therefore, the product owner plays a central role in designing sagas with guaranteed data consistency.

Future work

We are in the testing phase with a well-established fintech company. The primary results show that the RIG method and tools facilitate constructive and efficient discussions about business rules, microservices, and sagas. The next step will be testing concrete business problems in full-scale workshops.

Further down the road, we will introduce a framework for a "Consistency Ping," where it is possible to issue a consistency ping in a saga. The ping will build a graph of the partitioning services and their category (R, I, or G). The ping graph can then be automatically checked for consistency breaches. To enable the ping, we propose embedding metadata in the AsyncAPI standard where the services state their category (R, I, or G). In this way, adding "Consistency Ping" to API-first designed systems where the AsyncAPI standard describes the services will be possible.

Another area for future work is formulating a framework for handling interleaving sagas.

So – stay tuned – more to come.

About the Authors

BT