The Autonomic Manager is the component in charge of the self-tuning of the Data Platform. In the Cloud-TM Platform, self-optimization is a pervasive property that is pursued across multiple layers of the platform. Specifically, the Cloud-TM Platform leverages on a number of complementary self-tuning mechanisms that aim to automatically optimize, on the basis of user specified Quality of Service (QoS) levels and cost constraints, the following functionalities/parameters:
The following figure provides a high level overview of the architecture of the Autonomic Manager (AM), and reports the set of self-tuning mechanisms that it supports.
The self-optimization mechanisms supported by the Cloud-TM Autonomic Manager can be classified in two main types:
Mechanisms aimed at optimizing the data access locality of Cloud-TM applications, that is targeted at maximizing the collocation between the application code and the data it accesses.
The Workload and Performance Monitor (WPM) WPM Download
The Workload and Performance Monitor (WPM) is the subsystem in charge of gathering statistical information on the workload and performance/efficiency of the various components/layers of the Cloud-TM Platform, and of conveying them towards the Workload Analyzer (WA) and the Adaptation Manager (AdM). The WA exploits the monitoring-data streams produced by the WPM to automatically detect shifts of the workload that may give raise to QoS violations and/or lead the Cloud-TM Data Platform to operate in suboptimal configurations. This information is exploited, in its turn, by the AdM, which can react triggering corrective actions aimed to alter the scale and/or configuration of the Data Platform.
The core component that has been used for designing/developing WPM is Lattice (dowload here) a monitoring framework (developed in the context of recent EU projects such as
RESERVOIR) that is designed from the grounds up to meet the requirements of large scale, virtualized cloud infrastructures, and represents the backbone data dissemination infrastructure of the Cloud-TM WPM.
The Workload Analyzer (WA)
Sitting between the Workload and Performance Monitor (WPM) and the Adaptation Manager (AdM), the WA bears the following responsibilities in the Cloud-TM Platform:
The Adaptation Manager (AdM) AdM Download
The Adaptation Manager (AdM) is the key component of Autonomic Manager, and its actual “brain”. As already mentioned, this module is in charge of driving the self-tuning of a number of mechanisms of the Cloud-TM Data Platform, as well as of automating its QoS-based resource provisioning process (by transparently acquiring/releasing resources from IaaS providers).
The following figure depicts the internal architecture of the Adaptation Manager, highlighting its main building blocks and how it interacts with the other modules of the Cloud- TM Platform. The Adaptation Manager is formed by two main subcomponents, the Performance Prediction Service and the Platform Optimizer.
Performance Prediction Service
The Performance Prediction Service encapsulates diverse performance forecasting mechanisms that rely on alternative predictive methodologies working in synergy to maximize the accuracy of the prediction system, and, consequently, of the whole self-tuning process. In more detail, the Performance Prediction Service exploits the notion of model diversity, i.e. it combines white-box (e.g., analytical models) and black-box (e.g., machine-learning techniques) approaches with complementary strengths and weaknesses in order to take the best of the two approaches, i.e.:
The Performance Prediction Service can leverage
on diverse prediction methodologies, which include: analytical methods, machine learning techniques, simulation techniques.
This predictor relies on an analytical models (based on queueing theory arguments) to model data contention dynamics. More in detail, the analytical model at the basis of this predictor uses mean-value analysis techniques to forecast the probability of transaction commit, the mean transaction response time, and the maximum system throughput. This allows supporting what-if analysis on parameters like the degree of parallelism (number of nodes) in the system or shifts of workload characteristics, such as changes of the transactions’ data access patterns. One key element of the modeling approach is that it does not rely on classic, strongly limiting assumptions on the uniformity of transaction’s accesses over the whole data-set. Instead, it introduces a powerful abstraction that allows the on-line characterization of the application data access pattern in a lightweight and pragmatical manner, which is called Application Contention Factor (ACF).
This predictor is based on Discrete Event Simulation techniques, and includes a set of discrete event models which are able to simulate the behavior of the different operating modes supported by the Cloud-TM Data Platform. The whole architecture of the simulation component is highly modular, since it is based on skeleton models, which allow the instantiation of actual models able to capture the dynamics of differentiated distributed data management schemes, as well as of differentiated platform scales. Also, it is highly configurable, since it offers a suite of different embedded data access models, relying on a wide set of parameterizable distributions, and also offers the possibility to simulate data access patterns based on traces of the accesses, as provided by the tool chain formed by WPM and WA.
Machine Learning based predictors
This predictors relies on pure black-box machine learning techniques to forecast the throughput, abort rate and mean execution time (or its x-th percentile) of the transaction classes composing the input workload. This is a multiple-input-multiple-output (MIMO) regression problem, in which for each of the above parameters, we aim at identifying a corresponding function that captures their dynamics over an input space composed by a rich set of features characterizing the workload and the scale/configuration of the platform.
During the project we experimented with several machine-learning techniques, including Artificial Neural Networks, Decision Tree algorithms for regression and classification problems, as well as on-line reinforcement learning algorithms. The prototype of the Cloud-TM Platform ships with a performance predictor based on Cubist, a decision tree regressor that has been widely tested during the project to build performance predictors of various subcomponents of the platform.
The Platform Optimizer is the component in charge of defining the reconfiguration strategy of the various self-tuning schemes embedded in the Cloud-TM Platform. This module has a flexible an extensible software architecture, which allows specifying a chain of optimizers aimed at tuning different parameters/behaviours of the Data Platform, namely:
the number of data replicas, or, shortly, replication degree;
its replication protocol;
the placement of data across the nodes of the platform;
the policy used to distribute requests among the nodes of the platform.