Downloads‎ > ‎

Infinispan version employing a Total Order based commit protocol

DRAFT

Question? should I put the extended statistics + top-key? maybe in another page?

Total Order based commit protocol

Replicated Mode (a.k.a Full Replication)

Like the current Infinispan's Two-Phase-Commit (2PC) based replication mechanism, also this is a multi-master scheme in which transactions are executed locally by any node and without any coordination among replicas till the commit phase.

Rather than using 2PC, however, this scheme scheme relies on a single Atomic Broadcast (AB, a.k.a. Total Order Broadcast, TOB) to disseminate transactions' write-sets while ensuring totally ordered delivery of messages across the various nodes.

The two main advantages of using Total Order based protocol for certifying transactions are the elimination of deadlock between replicas and it does not need two phases to commit a transaction. The first one is obvious, because the transactions are received by a single thread and in the same order on all replicas and the single thread deliver allow us to avoid lock acquisition during certification of a transaction. This guarantees that only one transaction is committing in an instant of time. Additional, it is only needed one phase to commit a transaction. The order of deliver is the same, and all replicas executes a deterministic certification, so all replicas have the same outcome for a transaction.

The description above has a problem when the write skew check is enabled. The lack of lock acquisition in certifying transactions will originate inconsistency, because the value of the key can change even if a local transactions has locked that key. To solve this problem, it must be done a write skew check when the transactions is received. So, the key version, in the moment of writing, is sent on prepare message. The write skew check is done by comparing the actual key version with the key version sent in the prepare message. If it is the same, the transaction is committed. Otherwise, the transactions is aborted.

The single thread deliver give us a easy way apply the modifications, because it is the only thread that change the values in data container. But it has some performance issues, because no conflicting transactions can be certified and their update applied in parallel. So We have implemented a multi-thread certification scheme that relies on total order deliver transactions. Before putting into a thread pool, the transactions knows what are the transactions that precedes it, i.e. that has conflicting key, in other words, the intersection of the write-set of both transactions is not empty. In the thread pool, the transactions waits for all preceding transactions completes before applying its modifications.

The one phase commit is used when the write skew check is disabled or when Infinispan is enlisted as Synchronization (see JTA specification). The figure below show a simple scheme of how it works, assuming that transactions tx1 and tx2 are trying to commit at the same time. If the write skew check is enabled (and Synchronization), the validation is performed by all cache members (validate(tx)) and they reach the same outcome (commit(tx) / rollback(tx)). When no validation is needed, the only possible outcome is the commit(tx).

<FIGURE>

When the write skew check is enabled and Infinispan is executing distributed transactions (via XaResource, see JTA specification), then two phases are needed to commit a transaction in order to be compliant with the specificatino. In this case, the originator will perform the write skew check in prepare phase and the outcome is returned to the Transaction Manager that will later commit or rollback the transaction in the second phase. The figure below shows a scenario where two transactions (tx1 and tx2) are trying to commit at the same time. In this case, when a transaction is delivered in total order, it collects all the previously conflicting transactions, before submit the transaction in the thread pool. In the thread pool, a transaction will wait until all previously conflicting transactions collected has finished (wait(tx)). After this waiting period, it performs the write skew check (validate(tx)) and sends a reliable message with commit or rollback to all cache members (commit(tx) / rollback(tx)).

<FIGURE>

Distributed Mode (a.k.a Partial Replication)

In the distributed mode, when the write skew is disabled, the behavior is the same, with exception of the total order primitive in use. In this case, it is used a primitive that ensures total order when a destination of a message is a subset of cache members.

When the write skew is enabled, we use two phases to commit, even if Infinispan is used standalone. The key owners are the only one who can perform the write skew check and the originator needs to collect all the validation outcome. All the key owners perform the write skew check in parallel, but the originator only waits for the outcome of the first one, for each key modified. As soon as a negative outcome is received, the transaction is immediately aborted.

Passive Replication based commit protocol (Full Replication only)

Passive Replication based commit protocol allows read-only transactions to be executed on top of any node. Instead, write transactions are processed exclusively by a primary (or master) node, which serializes them via a local concurrency control mechanism, and propagates the updates synchronously towards the remaining replicas (typically called the backups).

This protocol, compared to multi-master approaches (like the native two-phase commit of Infinispan), has the potential to lead to better performances in high contention scenarios. Although implementing a multi-master scheme allows every node to process both read and write transactions, resulting in a higher computational power serving write transactions, two-phase commit based replication protocols are prone to thrashing at high contention rate. On the other hand, the multi-master protocol exhibits higher scalability in scenarios in which the write-sets of the transactions rarely conflict; conversely, in this case the primary node of the Passive Replication based protocol acts as the bottleneck of the system.

Performance / Benchmarking

<Plot of Switch Paper gives all we need =) >

Where can I find the source code?

The code is under the Git, a distributed version control system. You can download the source code from our GitHub account in here: https://github.com/pruivo/infinispan/tree/pb_to_2pc_stats (TODO: move the code to Cloud-TM's github account!!)

To download it follow this small steps

#it creates a folder named infinispan with the code
$ git clone git://github.com/pruivo/infinispan.git
$ cd infinispan
$ git checkout pb_to_2pc_stats


#or if you want a different folder name
$ git clone git://github.com/pruivo/infinispan.git <folder name>
$ cd <folder name>
$
git checkout pb_to_2pc_stats

How can I install it and use it in my application?

Requirements: Java, Maven and Internet Connection.

To install it, follow the simple steps:

$ cd <infinispan code folder>
$ mvn clean install [-DskipTests=true]


The option -DskipTests=true will avoid to run all the tests cases. After this, to use it in you application, you just need to do a last step, depending of your application

Maven Application: add Infinispan as a dependency to your pom.xml

<dependencies>
  ...
  <dependency>
    <groupId>org.infinispan</groupId>
    <artifactId>infinispan-core</artifactId>
    <version>5.2.0-SNAPSHOT</version>
  </dependency>
  ...          

</dependencies>

Other: add the Infinispan Core jar to your application classpath. The jar can be found in:

<infinispan code folder>/core/target/infinispan-core.jar

How can I configure it?

Infinispan Configuration


<?xml version="1.0" encoding="UTF-8"?>
  <infinispan
    xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"" >> ${DEST_FILE}
    xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
    xmlns="urn:infinispan:config:5.1\>

    <global>
      ...
      <transport clusterName="infinispan-cluster">
        <properties>
          <property name="configurationFile" value="jgroups.xml" />
        </properties>
      </transport>

      <totalOrderExecutor>
        <properties>
          <property name="minThreads" value="8"/>
          <property name="maxThreads" value="64"/>
          <property name="queueSize" value="1000"/>
          <property name="keepAliveTime" value="60000"/>
        </properties>
      </totalOrderExecutor>
      ...
    </global>


    <default> <!-- or <namedCache name="..."> -->
      ...
      <transaction

        ...
        useSynchronization="true"
        transactionProtocol="TOTAL_ORDER"

        ... />
      ...
    </default> <!-- or </namedCache> -->
</infinispan>

Transport: configures the configuration file name to use in JGroups

Total Order Executor (Total Order based only): configures the thread pool for the multi-thread certification

Parameter
 Value Description
 minThreads integer lower bound of number of certification threads
 maxThreads integer upper bound of number of certification threads
 queueSize integer queue size where the transaction will be en-queue if all threads are busy
 keepAliveTime integer a thread will be terminated if idle for this amount of time (in milliseconds)

Transaction: configures the commit protocol to use

 Parameter Value Description
 transactionProtocol TOTAL_ORDER /
TWO_PHASE_COMMIT /
PASSIVE_REPLICATION
 selects the commit protocol to use.
If the value is TOTAL_ORDER, Infinispan will use the Total Order based protocol.
If the value is TWO_PHASE_COMMIT, it will use the two-phase-commit based protocol.
if the value is PASSIVE_REPLICATION, it will use the passive replication based protocol.
 useSynchronization true / false
 when it is true, it registers Infinispan as a Synchronization resource, otherwise, it registers it as Xa resource

Jgroups configuration


<config
      xmlns="urn:org:jgroups"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.0.xsd">
  ...
  <SEQUENCER/>
  <tom.TOA />
  ...
</config>

In the JGroups configuration, you must enable the SEQUENCER (for full replication) or tom.TOA (for partial replication) protocol to ensure the total order properties for the transaction prepare command.


Comments