Article Preview
TopIntroduction
State machine replication is one of the fundamental approaches that is used to ensure fault tolerance for a distributed system. Recently with the advances in cloud computing and serverless computing, state machine replication started to gain more importance. In state machine replication, the process to be run distributed is implemented as a state machine, keeping instances in separate locations/machines to handle requests in a distributed way. During running time, each replica handles different requests and executes them as if they are being orderly processed by a main state machine.
Challenges in running replicated state machines is handling the requests arriving at replicas and applying the effects of the incoming requests to replicas in a consistent way. To be more specific, requests that are arriving in a distributed way should be processed by replicas in the same order, causing each machine to visit the same states in same order and produce the same outputs in the same order. Different techniques can be applied to ensure this property and provide a better throughput during replica runs.
An example state machine replication can be seen in Figure 1 where a master state machine on top is replicated over three replicas. State machines transit between defined states such as A, B and C with incoming events such as E1, E2 and E3. An important remark is that the replicas can be deployed in proximate locations as well as in geographically distinct locations according to the context of usage. Another important point is using a master replica (or state machine) is also dependent on the context of usage. Even though no master is used replicas are expected to eventually be orchestrated to reflect a single logical state machine such as the master state machine present in Figure 1.
Figure 1. State machine replication
A possible way to achieve this might be to use a consensus protocol among the running instances. This consensus protocol might utilize a leader, such as a main state machine replica, and provide consistency with the guidance of the leader. Using a leader has its own merits and shortcomings such as being a single point of failure; a different alternative might be to use a leaderless approach. For leaderless approaches a lightweight protocol, such as two-phase commit protocol (2PC), that provides a smaller number of messaging among replicas to alleviate the cost to provide consistency is required. This requirement is more prominent in cloud systems since it is possible for replicas to run in distant locations.
In this paper, 2PC protocol's performance is improved by using state machine model's characteristics to detect write/read conflicts among the state machines in an early phase. Proposed approach detects the specific states where a specific variable is accessed and fail early if an already read variable is about to be overwritten by a replica elsewhere. This way, instead of checking just before writing, a specific replica would be able to sense if some variable becomes dirty before arriving the actual writing state.
The applicability of the proposed approach is demonstrated over two specific examples: a simple state machine that consists of a single sequential path of execution. A branching state machine contains two different paths of execution emerging from a state with two different possible transitions. Experiments on Amazon Web Services (AWS) show that the proposed approach has improved the number of state executions that are wasted because of a write conflict and increase throughput with respect to the 2PC. The main objectives and contributions of the presented study can be summarized as below:
- •
To pinpoint possible opportunities to enhance the performance of replicated state machines that are running with strict consistency using two phase commit protocol. Main contribution to this objective is the introduction of early conflict checking (and resolution) approach.
- •
To decrease the wasted state executions that are caused by write conflicts that arise during the execution of distributed state machines. Main contribution to this objective is the design and application of Enhanced 2PC protocol (E2PC).
- •
Exploring the effect of enhancing 2PC by taking advantage of early conflict detection, on cloud environments in terms of throughput and wasted state executions. Main contribution to this objective is the experimental evaluation of E2PC on a cloud environment.