Data Platform

The Data Platform is responsible for storing, retrieving and manipulating data across a dynamic set of distributed nodes, elastically acquired from the underlying IaaS Cloud provider(s).

The Data Platform Programming APIs have been designed to simplify the development of large scale data centric applications deployed on cloud infrastructure. They include the Object Grid Mapper, the Search API and the Distributed Execution Framework.

To this end, the programmatic interfaces offered by the Cloud-TM Data Platform allow to:

  • store/query data into/from the Data Platform using the familiar and convenient abstractions provided by the object-oriented paradigm, such as inheritance, polymorphism, associations;
  • take full advantage of the processing power of the Cloud-TM Platform via a set of simple abstractions that hide the complexity associated with parallel/distributed programming, such as thread synchronization and scheduling, and fault-tolerance;
  • enable the joint achievement of high scalability and strong consistency via fully-decentralized multiversioning distributed data management protocols, genuine partial replication techniques and locality aware load balancing mechanisms.

Lower in the stack we find a highly scalable, adaptive In-memory Distributed Transactional Key-Value Store/Distributed Transactional Memory(DTM), which represents the backbone of the Cloud-TM Data Platform. In order to maximize the visibility, impact and future exploitation of the results of the Cloud-TM project, the consortium agreed to use Red Hat's Infinispan as the starting point for developing this essential component of the Cloud-TM Platform. Throughout the project Infinispan has been extended with innovative data management algorithms (in particular for what concerns data replication and distribution aspects), as well as with real-time self-reconfiguration schemes aimed at guaranteeing optimal performance even in highly dynamic cloud environments

Data Platform Programming APIs 

The APIs exposed by the Cloud-TM Data Platform can be classified into three main categoriesObject Grid Data MapperQuery SupportDistributed Execution Framework.


Object Grid Data Mapper: Hibernate OGM and Fénix Framework

Hibernate OGM Download - Fénix Framework Dowload

The Object Grid Data Mapper (OGDM) module is responsible for handling the mapping from the object-oriented data model, which is used by the programmer, to the <key, value> pair data model, which is employed by the underlying Distributed Transactional Data Grid platform. 

One of the earliest design decisions for the Cloud-TM Platform was to base the API of the OGDM module on the well-known, industry standard, and widely accepted Java Persistence API (JPA). Some of the expected advantages of this approach was to ease the adoption of the Cloud-TM platform by the masses of programmers already familiar with JPA, and to provide a smoother migration path for applications already built on top of JPA.

 

Hibernate OGM, builds on the mature and robust infrastructure provided by Hibernate ORM to provide such a JPA-compatible implementation for the Cloud-TM OGDM module.

 

Yet, there are significantly new challenges to be tackled in the mapping to a distributed transactional data grid, such as being able to explore data locality on a highly distributed environment. Not knowing beforehand what where the best solutions for these challenges, nor whether they would require changes at the programming model, enforcing strict compliance with JPA seemed too limiting to start with. Moreover, OGM, being tightly dependant on Hibernate ORM, is a complex software project with a very large code base. Thus a strategic design decision for the Cloud-TM Platform was to provide an alternative API to the OGDM module. Namely, a 

 
simpler API that could allow us to rapidly prototype and validate the platform ca- pabilities, engaging in lower time risk compared to developing the same in the scope of the more complex and comprehensive Hibernate OGM. The API provided by the Fénix Framework.

Query Support: Hibernate Search Hibernate Search Download

Hibernate Search is an Object/Index Mapper: it binds the full-text index technology implemented by Apache Lucene to the Object Oriented world. 

Hibernate Search consists of an indexing and an index search component. Both are backed by Apache Lucene. Its core engine component only depends on Apache Lucene so that the Object/Index mapping engine can be reused in different projects and environments - in particular Infinispan Query. Hibernate Search also provides a module to integrate it with Hibernate ORM and the Fnix framework also has an integration with Hibernate Search.


Distributed Execution Framework (DEF) DEF Download as part of Infinispan

The Distributed Execution Framework aims at providing a set of abstractions to simplify the development of parallel applications, allowing ordinary programmers to take full advantage of the processing power available by the set of distributed nodes of the Cloud-TM Platform without having to deal with low level issues such as load distribution, thread synchronization and scheduling, fault-tolerance. 

Unlike most other distributed frameworks, the Cloud-TM Distributed Execution Framework uses data from the transactional data platform’s nodes as input for execution tasks. This provides a number of relevant advantages:


The Distributed In-memory Transactional Data Grid Infinispan Download

The backbone of the Cloud-TM Data Platform is represented by Infinispan, a highly scalable, in-memory distributed key-value store with support for transactions. Born as an open-source project sponsored by Red Hat, Infinispan has been selected as the reference Distributed Transactional Data Grid platform for the Cloud-TM project.

Throughout the project Infinispan has been extended with innovative data management algorithms (in particular for what concerns data replication and distribution aspects), as well as with real-time self-reconfiguration schemes aimed at guaranteeing optimal performance even in highly dynamic cloud environments.

The architecture of Infinispan is extremely flexible and supports a range of operational modes: standalone (non-clustered) mode, or distributed mode. In the standalone mode, it works as a shared-memory (centralized) Software Transactional Mem- ory (STM), providing support for transactions and offering key/value store interface.

In the following we overview

key components that have been developed or significantly extended during the Cloud-TM’s project.

  
  • The availability of the transactional abstraction for the atomic, thread safe manipulation of data allows drastically simplifying the development of higher level abstractions such as concurrent data collections or synchronization primitives.

  • The Distributed Execution Framework capitalizes on the fact that input data in the transactional data grid is already evenly distributed. Thus execution tasks are likely to be automatically distributed in a balanced fashion as well; users do not have to explicitly assign work tasks to specific platform’s nodes. However, our framework accommodates users to specify arbitrary subsets of the platform’s data as input for distributed execution tasks.

  • The mechanisms used to ensure the fault-tolerance of the data platform, such as redistributing data across the platform’s nodes upon failures/leaves of nodes, can be exploited to achieve failover of uncompleted tasks.


Consistency and Replication: the GMU protocol

The mechanisms for enforcing data consistency in presence of concurrent data manipulations and possible failures is of paramount importance for a distributed in-memory transactional data platform. In the scope of the project, Infinispan has been extended with a novel, highly scalable distributed multi-versioning scheme, called GMU, which has the following unique characteristics:

  • Strong consistency. GMU’s consistency semantics abides by the Extended Update Serializabilty criterion, which ensures the most stringent of the standard ISO/ANSI SQL isolation levels, namely the SERIALIZABLE level in Infinispan. Beyond that, it ensures that the snapshot observed by transactions, even those that need to be aborted, is equivalent to the one generated by some serial execution of transactions. By preventing transactions from observing non-serializable states, application developers are spared from the complexity of dealing explicitly with anomalies due to concurrency that may lead to abnormal executions.
  • Wait-free read-only transactions. GMU ensures that read-only transactions can be committed without the need for any validation at commit time. Further, it guarantees that read-only transactions are never blocked or aborted. These properties are extremely relevant in practice, as most real-life workloads are dominated by read-only transactions.

  • Genuine partial replication. GMU is designed to allow achieving high scalability also in presence of write intensive workloads by ensuring that update transactions commit by contacting exclusively the subset of nodes that maintain data they read/wrote. Unlike other multi-versioning distributed consistency protocols, hence, GMU can commit update transactions without either relying on any centralized service (which is doomed to become the bottleneck of the system and to ultimately limit scalability) or flooding all nodes in the system (which would generate a huge amount of network traffic in large scale systems).

The GMU protocol has been shown to scale up to more than 150 nodes as we can see in the following plot where we

 use the Vacation-Mix benchmark.


Dynamic Adaptation of the Replication Strategy: Polymorphic Replication

Another unique feature that was incorporated in the Cloud-TM Data Grid is the ability to dynamically adapt the distributed protocol employed to ensure data consistency, a feature that we called polymorphic replication. The key observation motivating this feature is that, as the vast literature in the area of transactional data consistency protocols demonstrate, no one-size-fits-all solution exists that guarantees optimal performance across all possible applications’ workloads. In order to achieve optimal efficiency even in presence of dynamically changing workloads, the Data Platform:

  1. includes three different data replication strategies, obtained as variants of the GMU protocol, which exhibit different trade-offs and are, consequently, optimized for different workloads;

  2. supports on-line switching between the above replication protocols, leveraging innovative non-blocking schemes that minimize performance penalization during system’s reconfiguration.

The three replication strategies implemented are the following:

  • 2PC. The original GMU protocol is a multi-master scheme (i.e., each node can process both read and write transactions) that relies on a variant of the classic Two Phase Commit scheme to certify transactions and determine the vector clock to attribute to a committing transaction. This protocol excels in update intensive workloads that exhibit limited contention. In such a scenario, in fact, all nodes can actively contribute to process update transactions without incurring in an excessive number of aborts due to contention. The main drawback of this protocol is that it is prone to distributed deadlocks as contention grows.
  • PB. In this variant of the GMU protocol, a single node, called the primary or master, is allowed to process update transactions, whereas the remaining ones are used exclusively for processing read-only transactions. The primary regulates concurrency among local update transactions, using a deadlock-free commit time locking strategy, and propagates synchronously its updates to the backup nodes. Read-only transactions can be processed in a non-blocking fashion on the backups, regulated by Infinispan’s native multiversion concurrency control algorithm. In this protocol, the primary is prone to become a bottleneck in write dominated workloads. On the other hand, by limiting intrinsically the number of concurrently active update transactions, it alleviates the load on the network and reduces the commit latency of update transactions. Further it is less subject to trashing due to lock contention in high conflict scenarios.

  • TO. Similarly to 2PC, this protocol is a multi-master scheme that processes transactions without any synchronization during their execution phase. Unlike 2PC, however, the transaction serialization order is not determined by means of lock acquisition, rather by relying on an Total Order Multicast service (TOM) to establish a total order among committing transactions. More precisely, with this variant of the GMU scheme, the prepare message is disseminated via the TOM primitive. Upon delivery of the prepare message by TOM, locks for the data accessed by the corresponding transaction are atomically acquired along the same order of delivery established by TOM. The usage of TOM ensures agreement on the lock acquisition order among the nodes involved in the transaction commit phase, and, consequently, avoids the possibility of incurring in distributed deadlocks. This protocol has higher scalability potential than PB in write dominated workloads, but is also more prone to incur in high abort rates in conflict intensive scenarios.

Self-tuning Data Placement: the AUTOPLACER

Another highly innovative feature developed within the context of the Cloud-TM project is the, so called, AUTOPLACER scheme. In a distributed data platform, such as Cloud-TM, processing applications’ requests can imply accessing data that are stored remotely, i.e. on different nodes of the platform. Hence, the performance and scalability achievable by Cloud-TM’s applications can be affected by the quality of the algorithms used to distribute data among the nodes of the platforms. These should be accurate enough to guarantee high data locality (and minimize remote data accesses), as well as sufficiently lightweight and scalable to cope with large scale applications. AUTOPLACER addresses this problem by automatically identifying the data items having a sub-optimal placement onto the platform and re-locating them automatically to maximize access locality. Scalability and practical viability of AUTOPLACER are achieved via innovative probabilistic algorithms, which exploit stream-analysis and machine-learning techniques.


Changing the Scale and Moving Data: the Non-Blocking State Transfer

Another relevant feature, that has been recently introduced in Infinispan is the, so called Non-Blocking State Transfer (NBST). This feature allows Infinispan to keep on processing transactions even when the scale of the platform is changing, i.e. whenever nodes are leaving/joining the system, or while the placement of data is being rearranged, e.g. by the AUTOPLACER mechanism, or if the Autonomic Manager requests to alter the replication degree of data to optimize performance. This feature is extremely relevant as it allows to minimize the impact of platform reconfigurations on applications’ performance and availability. Also, the NBST feature is offered for all the GMU variants, which allows for providing minimally intrusive reconfiguration independently of the distributed protocol that is currently used to coordinate data management within the platform.


Comments