Transactional, Object oriented, Self-tuning Cloud Data Store

Cloud-TM is a highly innovative data-centric middleware platform aimed at facilitating development and abating operational and administration costs of cloud applications.

Designed from the grounds up to meet the scalability and dynamicity requirements of cloud infrastructures, Cloud-TM provides intuitive, yet powerful abstractions aimed at masking complexity and at allowing ordinary programmers to unleash the potentiality of large-scale cloud platforms.

Further, Cloud-TM integrates pervasive self-tuning schemes, which  exploit in a synergic way diverse methodologies like analytical modelling, simulation and machine learning,  to pursue optimal efficiency at any scale, and for any workload.

The Challenge 

The appearance of the first commercial Cloud Computing platforms has represented a significant step towards the materialization of the vision of utility-computing.

However, the promise of infinite scalability catalyzing much of the recent interest about Cloud Computing is still menaced by one major pitfall: the lack of programming paradigms and abstractions capable of bringing the power of distributed programming into the hands of ordinary programmers, sheltering from the complexity of developing systems deployed over large scale, elastic cloud platforms.

A crucial issue that we have tackled in the Cloud-TM project has been developing innovative mechanisms and abstractions aimed at ensuring adequate consistency levels while being:

1. simple and familiar for the programmers

2. highly efficient and scalable

3. fault-tolerant and highly available.

Decades of research and field experience in this area have brought to the development of a plethora of different approaches to ensure state consistency in distributed platforms, and taught a fundamental, general lesson.  No universal, one-size-fits-all solution exists, as the efficiency of individual state management approaches is strongly affected by both:

1. the characteristics of the incoming workload, such as the ratio of read/write operations, as well as the spatial/temporal locality in the data access patterns, and

2. the scale of the system (e.g. low vs high number of nodes, local vs geographical distribution) on which these mechanisms are deployed.

The complexity of this problem is hence particularly exacerbated in cloud computing platforms due to the feature that is regarded as one of the key advantages of the cloud: its ability to elastically acquire or release resources, varying the scale of the platform in real-time to meet the demands of varying workloads.


The Cloud-TM approach

The Cloud-TM project addressed these issues by building a highly innovative data-centric middleware platform. The Cloud-TM platform is designed from the grounds up to meet the scalability and dynamicity requirements of cloud infrastructures, while providing intuitive, yet powerful abstractions aimed at  masking complexity and allowing ordinary programmers to unleash the potentiality of large-scale Cloud platforms.

 


Most cloud computing infrastructures embrace weak consistency models that achieve scalability at the cost of an increase of complexity for the programmers. This leads to a significant growth of software development costs and of the time to market, ultimately hindering competitiveness.

Conversely, Cloud-TM adopts an intuitive, yet scalable programming paradigmThe Cloud-TM programming paradigm integrates the friendly abstraction of atomic transaction as a first-class programming construct, sheltering programmers from having to deal with the idiosyncrasies of weak consistency models. Strong-consistency and scalability, two properties often seen as antagonists, are reconciled thanks to innovative transactional consistency schemes designed precisely to meet the scalability and elasticity requirements of typical cloud infrastructures

Beyond transactional consistency, the Cloud-TM programming model provides transparent support for object orientation and queries, concurrency-friendly data structures and frameworks to control distributed execution of tasks, hiding issues such as fault-tolerance, load distribution and data placement.


Finally, Cloud-TM's pursues the minimization of the other major source of costs for cloud-based applications, namely operational costs, in a twofold way:

  • Automating the provisioning of resources from the cloud based on user specified target criteria in terms of both Quality of Service and budget constraints. This allows guaranteeing that applications only use the minimum amount of necessary resources to withstand the current load pressure, minimizing both administration and operational costs.

  • Maximizing efficiency (i.e. the costs/benefits ratio in the Cloud Computing usage-based pricing model) via pervasive self-tuning schemes that adapt the middleware's internals to ensure optimal performance at any scale, and for any workload. This means making the most effective use of the currently allocated resources, leading to a reduction of the amount of required resources, and, consequently, of the operational costs.

 

Overview of the Cloud-TM Platform

The Cloud-TM Platform high level architecture is depicted in the following figure. It is formed by two main parts: the Data Platform and the Autonomic Manager.




Data Platform. The Data Platform is responsible for storing, retrieving and manipulating data across a dynamic set of distributed nodes, elastically acquired from the underlying IaaS Cloud provider(s).


The Data Platform Programming APIs have been designed to simplify the development of large scale data centric applications deployed on cloud infrastructure. They include the Object Grid Mapper, the Search API and the Distributed Execution Framework.

To this end, the programmatic interfaces offered by the Cloud-TM Data Platform allow to:

  • store/query data into/from the Data Platform using the familiar and convenient abstractions provided by the object-oriented paradigm, such as inheritance, polymorphism, associations;
  • take full advantage of the processing power of the Cloud-TM Platform via a set of simple abstractions that hide the complexity associated with parallel/distributed programming, such as thread synchronization and scheduling, and fault-tolerance;
  • enable the joint achievement of high scalability and strong consistency via fully-decentralized multi-versioning consistency schemes, genuine partial replication techniques and locality aware load balancing mechanisms.

Lower in the stack we find a highly scalable, adaptive In-memory Distributed Transactional Key-Value Store/Distributed Transactional Memory(DTM), which represents the backbone of the Cloud-TM Data Platform. In order to maximize the visibility, impact and future exploitation of the results of the Cloud-TM project, the consortium agreed to use Red Hat's Infinispan as the starting point for developing this essential component of the Cloud-TM Platform. Throughout the project Infinispan has been extended with innovative data management algorithms (in particular for what concerns data replication and distribution aspects), as well as with real-time self-reconfiguration schemes aimed at guaranteeing optimal performance even in highly dynamic cloud environments.


Autonomic Manager. The Autonomic Manager is the component in charge of the self-tuning of the Data Platform. In the Cloud-TM Platform, self-optimization is a pervasive property that is pursued across multiple layers of the platform


Specifically, the Cloud-TM Platform leverages on a number of complementary self-tuning mechanisms that aim to automatically optimize, on the basis of user specified Quality of Service (QoS) levels and cost constraints, the following functionalities/parameters:

  • the scale of the underlying platform, i.e, the number and type of machines over which the Data Platform is deployed;
  • the data replication degree, i.e. number of replicas of each datum stored in the platform;
  • the transactional data consistency protocol;
  • the data placement strategies and request distribution policies, with the ultimate goal of maximizing the data access locality of Cloud-TM applications.

The following figure illustrates an example scenario highlighting the self-optimizing capabilities of the Cloud-TM platform. Depending on the current workload characteristics, Cloud-TM can autonomously acquire or release resources from the Cloud, and adjust, in a transparent manner, its internal consistency mechanisms to maximize performance and efficiency.



Videos


The YouTube channel of the project contains several videos demonstrating a number of features of the Cloud-TM platform.

Below you find two of these videos, demonstrating two innovative self-tuning features of Cloud-TM, namely AutoPlacer and Polymorphic Replication.


The Open Source Way


Since the early stages of the project, academic partners have worked in close collaboration with the leading company in the open-source software arena, Red Hat. This has allowed to integrate a number of innovative solutions in highly visible open source projects, like Infinispan, JGroupsHibernate Search and Hibernate OGM.


The choice of embracing open source, and the integration of the best-of-breed research results in popular Red Hat projects, have strongly amplified the impact and visibility of the project's achievements, and paved the way for their immediate industrial exploitation.


The choice of open source means also that the Cloud-TM platform is freely available for the broad community of SMEs that find in cloud computing a highly attractive model, not only from the economic perspective (thanks to its advantageous pay-only-for-what-you-use billing scheme), but also due to its simplicity and scalability.


News

Cloud-TM successfully reviewed by Ocean!
October 2014
            
 Read more...

New press release about the Cloud-TM Platform prototype!
November 2013
            Read more...

The Final Cloud-TM Platform is out!

September 2013
            Read more...

A new factsheet is out!
September 2013
            Read more...

NETYS Best Paper Award
May 2013
            Read more...

Cloud-TM Data Platform and Autonomic Manager are out!
February 2013
            Read more...

FutuGrid Project Challenge Award
January 2013