Autonomic Manager

The Autonomic Manager is the component in charge of the self-tuning of the Data Platform. In the Cloud-TM Platform, self-optimization is a pervasive property that is pursued across multiple layers of the platform. Specifically, the Cloud-TM Platform leverages on a number of complementary self-tuning mechanisms that aim to automatically optimize, on the basis of user specified Quality of Service (QoS) levels and cost constraints, the following functionalities/parameters:

  • the scale of the underlying platform, i.e, the number and type of machines over which the Data Platform is deployed;
  • the data replication degree, i.e. number of replicas of each datum stored in the platform;
  • the protocol used to ensure transactional data consistency;
  • the data placement strategies and request distribution policies, with the ultimate goal of maximizing the data access locality of Cloud-TM applications.

The following figure provides a high level overview of the architecture of the Autonomic Manager (AM), and reports the set of self-tuning mechanisms that it supports.


The self-optimization mechanisms supported by the Cloud-TM Autonomic Manager can be classified in two main types:

  • Solutions aimed at identifying the optimal values of a set of key configuration parameters/mechanisms of the Cloud-TM Data Platform, namely the 

     

    scale of the underlying platform, the number of replicas of each datum stored in the platform, the replication protocol.

  • Mechanisms aimed at optimizing the data access locality of Cloud-TM applications, that is targeted at maximizing the collocation between the application code and the data it accesses.


The Workload and Performance Monitor (WPM) WPM Download

The Workload and Performance Monitor (WPM) is the subsystem in charge of gathering statistical information on the workload and performance/efficiency of the various components/layers of the Cloud-TM Platform, and of conveying them towards the Workload Analyzer (WA) and the Adaptation Manager (AdM). The WA exploits the monitoring-data streams produced by the WPM to automatically detect shifts of the workload that may give raise to QoS violations and/or lead the Cloud-TM Data Platform to operate in suboptimal configurations. This information is exploited, in its turn, by the AdM, which can react triggering corrective actions aimed to alter the scale and/or configuration of the Data Platform.

The core component that has been used for designing/developing WPM is Lattice (dowload here) a monitoring framework (developed in the context of recent EU projects such as 

RESERVOIR) that is designed from the grounds up to meet the requirements of large scale, virtualized cloud infrastructures, and represents the backbone data dissemination infrastructure of the Cloud-TM WPM.

 



The Workload Analyzer (WA)

Sitting between the Workload and Performance Monitor (WPM) and the Adaptation Manager (AdM), the WA bears the following responsibilities in the Cloud-TM Platform:

  • Data aggregation. The streams of monitoring data produced by the distributed nodes of the Cloud-TM Platform via the WPM are gathered by the WA, which 

     

    exposes programmatic APIs and web-based GUIs allowing for aggregating statistics originated by different software layers and/or groups of nodes.

  • Data filtering and workload/KPI change detectors. The WA integrates algorithms aimed at detecting statistically relevant variations of platform’s KPIs and/or workload characteristics. These techniques allow filtering unavoidable statistical fluctuations and enhance the stability and robustness of the self-tuning mechanisms integrated in the Adaptation Manager.

  • Workload and resource demand prediction. The WA includes algorithms for time-series forecasting, which allow predicting future workload’s trends and allow the Adaptation Manager to enact proactive self-tuning schemes. This func- tionality represents a fundamental building block for any proactive adaptation scheme, i.e. schemes triggering reconfigurations of the platform anticipating imminent workloads’ changes, which are particularly desirable in case the platform’s reconfiguration (as in the case of elastic scaling) can have non-negligible latencies.

  • Integration with RHQ and R. Plug-ins have been built to inter-connect the WPM with RHQ, a popular open-source suite for the management and administration of systems deployed on large scale, distributed platforms. This allows benefitting from the advanced graphing, analysis and ruleset-based alert notification mechanisms integrated in RHQ. Further, scripts were also developed showing how to interface RHQ, via its RESTful APIs, with the R [73] statistical engine. This opens the possibility to run a wide range of time series analysis methods (such as, moving averages, ARIMAX models, Kalman filters) aimed to forecast future trends of the workload fluctuations.


The Adaptation Manager (AdM) AdM Download

The Adaptation Manager (AdM) is the key component of Autonomic Manager, and its actual “brain”. As already mentioned, this module is in charge of driving the self-tuning of a number of mechanisms of the Cloud-TM Data Platform, as well as of automating its QoS-based resource provisioning process (by transparently acquiring/releasing resources from IaaS providers).

The following figure depicts the internal architecture of the Adaptation Manager, highlighting its main building blocks and how it interacts with the other modules of the Cloud- TM Platform. The Adaptation Manager is formed by two main subcomponents, the Performance Prediction Service and the Platform Optimizer.



Performance Prediction Service

The Performance Prediction Service encapsulates diverse performance forecasting mechanisms that rely on alternative predictive methodologies working in synergy to maximize the accuracy of the prediction system, and, consequently, of the whole self-tuning process. In more detail, the Performance Prediction Service exploits the notion of model diversity, i.e. it combines white-box (e.g., analytical models) and black-box (e.g., machine-learning techniques) approaches with complementary strengths and weaknesses in order to take the best of the two approaches, i.e.:

  • the high accuracy of black-box statistical methods when faced with workloads similar to those witnessed during their training phase;

  • the minimal training phase of white-box methods, and their high extrapolation power, i.e. their ability to achieve good accuracy even when providing forecasts concerning previously unexplored regions of the workloads’ parameter space.

The Performance Prediction Service can leverage 

on diverse prediction methodologies, which include: analytical methods, machine learning techniques, simulation techniques.


Analytical model based predictor      Transaction Auto Scaler (TAS)   -  Download TAS Predictor for AdM

This predictor relies on an analytical models (based on queueing theory arguments) to model data contention dynamics. More in detail, the analytical model at the basis of this predictor uses mean-value analysis techniques to forecast the probability of transaction commit, the mean transaction response time, and the maximum system throughput. This allows supporting what-if analysis on parameters like the degree of parallelism (number of nodes) in the system or shifts of workload characteristics, such as changes of the transactions’ data access patterns. One key element of the modeling approach is that it does not rely on classic, strongly limiting assumptions on the uniformity of transaction’s accesses over the whole data-set. Instead, it introduces a powerful abstraction that allows the on-line characterization of the application data access pattern in a lightweight and pragmatical manner, which is called Application Contention Factor (ACF).


Simulation based predictor     DAGS framework - Download the DAGS Simulator and the DAGS Connector for AdM

This predictor is based on Discrete Event Simulation techniques, and includes a set of discrete event models which are able to simulate the behavior of the different operating modes supported by the Cloud-TM Data Platform. The whole architecture of the simulation component is highly modular, since it is based on skeleton models, which allow the instantiation of actual models able to capture the dynamics of differentiated distributed data management schemes, as well as of differentiated platform scales. Also, it is highly configurable, since it offers a suite of different embedded data access models, relying on a wide set of parameterizable distributions, and also offers the possibility to simulate data access patterns based on traces of the accesses, as provided by the tool chain formed by WPM and WA.


Machine Learning based predictors 

Artificial Neural Network predictor  -  Download ANN for AdM

MorphR predictor  -  Download MorphR for AdM

This predictors relies on pure black-box machine learning techniques to forecast the throughput, abort rate and mean execution time (or its x-th percentile) of the transaction classes composing the input workload. This is a multiple-input-multiple-output (MIMO) regression problem, in which for each of the above parameters, we aim at identifying a corresponding function that captures their dynamics over an input space composed by a rich set of features characterizing the workload and the scale/configuration of the platform.

During the project we experimented with several machine-learning techniques, including Artificial Neural Networks, Decision Tree algorithms for regression and classification problems, as well as on-line reinforcement learning algorithms. The prototype of the Cloud-TM Platform ships with a performance predictor based on Cubist, a decision tree regressor that has been widely tested during the project to build performance predictors of various subcomponents of the platform.



Platform Optimizer

The Platform Optimizer is the component in charge of defining the reconfiguration strategy of the various self-tuning schemes embedded in the Cloud-TM Platform. This module has a flexible an extensible software architecture, which allows specifying a chain of optimizers aimed at tuning different parameters/behaviours of the Data Platform, namely:

  1. its scale, i.e, the number and type of nodes over which the Cloud-TM Data Platform is deployed;

  2. the number of data replicas, or, shortly, replication degree;

  3. its replication protocol;

  4. the placement of data across the nodes of the platform;

  5. the policy used to distribute requests among the nodes of the platform.


Comments