IntroductionApps for smart-phones are rapidly gaining attention as they are considered to be the core of the next generation of applications for the Internet [1]. A key distinguishing feature of smart-phone apps is that they can be used from almost any location, providing information on the position of its user through a GPS device or by triangulation on WI-FI access points and Cellular towers. Thus, it is not surprising that many smart-phone apps have Location-based Mobile Social Networking (LMSN), aka geo-social networking, facilities. The basic idea behind LMSN is to provide a second generation of Social Networks (a killer application for the web) that can take advantage of the position of the users in order to provide innovative services. Notably, LMSN are supported by a significant market share, as ABI Research estimates LMSN will generate global revenues of $3.3 billion by 2013 [2]. The LMSN market is already very rich and features a wide spectrum of products developed both by well known corporations and by independent players (see [3] for an exhaustive list of products).For example, both Google and Microsoft developed their own LMSNs, called Latitude [4] and Vine [5], respectively. Nokia, instead, recently bought a similar product from an independent player, called Plazes [6]. Pairwise, there are many independent companies which are developing their own innovative products [7–12]. Pinning down mobile social networks to their core, one can think of such systems as client-server architectures, where:
GeoGraph may experience sudden peaks of the load, due to the exponential growth phenomenon exhibited by social networks. Moreover, LMSN may experience flash crowds triggered by social events, like concerts or conferences that cause hot-spots in specific geographical regions. Indeed, flash crowds often overload web systems to a point when their services are degraded or disrupted entirely [13]. Being a generic framework, GeoGraph supports the development of a wide range of LMSN applications characterized by highly heterogeneous and dynamic workloads exhibiting diverse data access patterns and conflict rates. This will provide Cloud-TM with realistic use cases to assess the effectiveness of its self-tuning mechanisms In fact, on one hand, GeoGraph’s clients frequently update the state of the graph nodes (i.e., position of users). On the other one hand, the server side part of the application is responsible for using this information to dynamically compute the topology of the graph, i.e.: adding edges to nodes (users) that get close to each other, delete edges among nodes that move away from each other. In order to enhance scalability and ensure a timely update of the graph, these updates are performed in parallel, by multiple threads possibly residing on different nodes. Thus, the data access pattern of GeoGraph (or, more precisely, of the LMSN applications developed using the GeoGraph’s framework) appears prone to generate high contention levels. GeoGraph is a generic framework that will support the development of a wide range of LMSN applications characterized by highly heterogeneous and dynamic workloads. This will provide realistic use cases to assess the effectiveness of the Cloud-TM platform. In the following, we describe the workload profiles generated by some of the existing LMSN applications:
MADMASSThe MAssively Distributed Multi Agent System Simulator (MADMASS) is an open-source framework for developing rich-client web applications that require scalability and feature (real-time) interactions among users. Target applications include, but are not limited to, Multi-Player Online Games, Transaction Processing Systems, Location-based Mobile Social Networks (or geo-social networks) and cooperative systems (e.g., crowd-sourcing apps). MADMASS has been developed in the context of the Cloud-TM project and it constitutes the core of the two Cloud-TM pilots. MADMASS is available at https://github.com/algorithmica/madmass.MAssively Distributed. MADMASS is at the core of the Cloud-TM pilots, and as such, it is designed for the Cloud. MADMASS relies on the best practices for developing apps for the Cloud and integrates seamlessly with the Cloud-TM platform. Multi Agent System Simulator. MADMASS has its roots in Artificial Intelligence and Multi-Agent Systems research. These disciplines provide the foundations of the MADMASS programming paradigm, making it a versatile and intuitive framework for developing complex applications that feature a high degree of interaction among users. The MADMASS project stands on the shoulders of a number of existing open-source projects. MADMASS is a Ruby On Rails Engine. Thus whatever you can do with rails, you can do it with MADMASS too. It supports opensource (Socky, Stilts) and commercial (Pusher) WebSockets implementations for enabling real-time interactions. Rich browser GUIs are possible thanks to HTML5 and to javascript frameworks such as MooTools and JQuery . We leverage on the JBoss technology to deliver MADMASS apps as enterprise apps. We rely on TorqueBox for extending the footprint of Rails application and enabling functionalities such as clustering, load-balancing and high-availability out-of-the-box. TorqueBox provides an all-in-one environment, built upon the latest, most powerful JBoss AS Java application server [17]. A MADMASS app is composed of many (intelligent) agents that offer one or more services. When a user contacts a MADMASS app, an agent is assigned to him. The role of this agent is to give the user access to a virtual environment (e.g., a virtual world, a data repository, a social network, ...). Some examples of such apps include, but are not limited to:
Figure 1: MADMASS Architecture A MADMASS application is composed of three distinct communities of Agents (see Figure 1):
For example, in GeoGraph, the domain model is a graph where nodes are geo-localized entities and edges represent proximity relationships. The graph is persisted on the DSTM and the Domain agents have the responsibility of maintaining this data structure on behalf of the users. A GeoGraph client may, for example, want to post a new geo-localized comment at some location. In order to do so, it will send a “micro-blog post” request to the action queue. When the request is received an agent will be put in charge of handling such request. In particular, he will
The third community of agents, Autonomous Agents, can be used to perform several tasks. For the purpose of GeoGraph, Autonomous Agents are used to simulate users. For example, simulators can be used to benchmark the application or, in our case, the underlying Cloud Platform. The Autonomous Agents interact with the domain in the same way human agents do: they send commands to the actions queue and get updates on the environment state through the percepts topic. Moreover, Autonomous Agents offer an interface to Human Agents for managing the Autonomous Agent Community. In particular, authorized users can create groups of agents, possibly of different types. They can also set a simulation speed, pause, start, stop and destroy each group. A more concrete example will be provided when describing the GeoGraph workload generator in Section 3.2. Why Transactions?The MADMASS architecture allows for naturally scaling and well serves for delivering Software as a Service (SaaS) in a Cloud. Nevertheless, the presence of many concurrent agents may lead to conflicts that can put at risk the correctness of the process. To avoid this problem, all actions are performed within a transactional context.The basic idea is that any operation of the data model must be defined in terms of actions. Actions have a simple, yet powerful, interface that is composed of the following three methods:
Listing 1: Pseudo-code for Action Execution transaction do{ if act.applicable? act.execute percepts = act.build_percepts } send(percepts) Thus, depending on the type of action and the state of the environment we can have very different type of transactions:
The GeoGraph PilotGeoGraph is a geo-social MADMASS app. Being developed on top of MADMASS, GeoGraph is extremely flexible and can be used as a basis for the development of any geo-social app as it implements a set of services, in terms of actions, that are commonly used in many geo-social apps (e.g., position tracking and micro-blogging). To this end, implementing a new geo-social just amounts to developing a new client with the MADMASS GUI. Also extending the services provided by the Domain Agents is fairly simple, as it is enough to define new actions. As a key feature, GeoGraph comes with a load generator that simulates users and that can be used to benchmark applications.GeoGraph is composed of two components:
GeoGraph Domain AgentsThe GeoGraph domain is a graph where nodes are GeoObjects and where edges represent proximity relations. GeoObjects can be of several types. For the time being, we have two main types of GeoObjects: moving objects (such as pedestrians, bikers and drivers) and still objects (such as micro-blog posts). GeoObjects are associated to Users, and as such a User can have many GeoObjects.The domain model has been described by using the Cloud-TM DML [19] that allows us to model the domain model once, and generate two different implementations (i.e., Hibernate OGM and Fénix) that can then be benchmarked one against the other. GeoGraph Domain Agents offer the following set of geo-social services:
The GeoGraph Domain Agents app offers an interface (see Figure 2) to monitor the evolution of the system. By using the Google Maps API, we show the current set of GeoObjects as Markers on a map. By using the MADMASS architecture, updates on the map are directly pushed on the client. As a result, the Map elements are animated and move on the map. The monitoring interface also shows the graph structure by depicting the edges that connect the GeoObjects. Finally, it is possible to inspect the GeoObjects by clicking on the markers. This operation opens the info window with all the relevant information (e.g., text of a post -if a post-, coordinates, type of object). Figure 3: Agent Farm Interface (a) (b) Figure 4: a) GeoGraph Update strategy selection b) Creation of a new Agent Group GeoGraph Agent FarmThe GeoGraph Agent Farm is a set of MADMASS Autonomous Agents used to simulate users in order to benchmark GeoGraph and the underlying Cloud-TM platform. Figure 3 shows an overview of the web interface to the GeoGraph Agent Farm. Figure 4a) depicts the interface for selecting the strategy to update the edges within the GeoGraph. Figure 4b), instead, shows the interface for creating new groups of simulated users (new group button). Upon the creation of a new group, one can specify the following parameters:
We have currently implemented three types of simulated users:
Data ContentionIn GeoGraph, the data structure maintaining locations and relations among users (i.e., a graph) is stored server-side, in the Distributed Software Transactional Memory platform at the core of Cloud-TM. This data structure will then be concurrently updated by a variable number of processing threads (physically distributed across a dynamically variable number of machines) to reflect the alteration of the geographical position of the users, and accordingly update the graph data structure.As already mentioned previously, there are currently two methods for updating the edge data structure. In the following, we describe one of these approaches to highlight data contention issues. Nevertheless, similar considerations apply also to the other method. Using simplified pseudo-code, an example of transaction used in GeoGraph to alter the graph topology could be the following: Listing 2: Example pseudo code for Graph Update 1. #upon reception of a new position of some user "u" on_update { #atomic transactions 2. transaction do { #retrieve graph node associated with user u 3. myNode=Graph.get_node_of_user(u); #update (i.e. write) position of myNode 4. myNode.update_position(); #remove edges with current neighbour nodes #that are now farther away than some threshold K 5. for_each n in neighbors(myNode){ #read position of node "n" #read list of neighbor nodes of "myNode" 6. if(distance(n,myNode) >= someThresholdK) #update (i.e. write) list of neighbor nodes of #"myNode" and "n" 7. remove_edge(n,myNode) 8. } #add edges with graph nodes that are #within some threshold K 9. for_each_other_node n in Graph { #read position of node "n" 10. if(distance(n,myNode) < someThresholdK) # update (i.e. write) list of #neighbor nodes of "n" and "myNode" 11. add_edge(n,myNode) 12. } 13. } 14. } This code block will be executed whenever a client updates his position, with a frequency that depends on the actual mobility patterns of users, ranging from very slow (e.g with users strolling around the city) to very fast (e.g. users traveling by car or train). As a consequence of the parallel manipulation of the graph, conflicts will arise on the data structures (e.g. lists) maintaining the set of edges between each pair of nodes. For instance, assume that a transaction T executes line 5 and determines that node n is currently a neighbor of myNode. Now, if, before T is committed, n moves away and a transaction T⋆ removes n from the list of neighbors of myNode, the transaction T will have to be aborted since it has executed on a stale snapshot. Other read/write conflicts may arise between lines 3, 6 and 10 of two concurrent transactions, where the former one updates the position of a node and the latter ones read this position to determine whether the graph topology should be altered. As a final remark, note that line 9 of the pseudo-code adopts a naif approach that will be analyzed and improved during the following months. For example, in GeoGraph only a subset of the graph’s nodes will be considered into this "for" cycle. To this end, GeoGraph could adopt heuristics that will restrict the analysis only to the nodes that are in the same "geographic area" to myNode and/or exploit the indexing provided by the Cloud-TM search API. This will contribute to enhance the scalability of the algorithm. Conclusions and Future WorkThe current prototype of GeoGraph already includes all the core functionalities of many geo-social applications (i.e., position tracking, micro-blogging, social tracking). The prototype has been integrated with the Object Grid Mapper of the Cloud-TM Data Platform Programming API and can be used for preliminary benchmarking of the Cloud-TM Platform.The GeoGraph Agent farm allows for simulating several synthetic workloads that span over the spectrum of profiles of typical geo-social apps. At this stage, we have already implemented a set of Autonomous Agents in the GeoGraph Agent Farm that can exhibit either a read-dominated or write-dominated workload profile, depending on their type. The agent farm allows for dynamically varying the profile of the workload along two distinct dimensions: 1) read/write ratio and 2) intensity. Indeed, by deploying multiple groups with different numbers and types of agents, we can vary the read/write ration. Moreover, by changing the number of agents and the speeds at which they perform actions, we can change the intensity of the workload. Notice that, as agent groups can be edited at run-time, we can evolve the workload profile dynamically during the execution of experiments. Let us consider the scenarios and workload profiles described in Section 1, at page ß, and how they can be replicated by using the current implementation of the GeoGraph Agent Farm. Consistently, with the description provided above, in the first three scenarios the Graph Update services will be switched off and thus there will be low contention on data:
There can be potentially a huge number of computational tasks that run in parallel both in the GeoGraph Agent Farm and in GeoGraph itself. To date, these tasks are clustered and load balanced by using HornetQ. However, this approach does not take into account locality of data (that is of crucial importance in geographical applications) and it can introduce performance overheads as the number of nodes grows. To address this issue, we plan to integrate with the Cloud-TM Distributed Execution Framework (DEF) as it allows for placing computational tasks close to the data that will be accessed by this task. Finally, DEF will allow us to synchronize the execution of the tasks (through Joins and Forks), features that would be greatly beneficial for a more accurate control of the GeoGraph Agent Farm. Getting StartedIn the following we provide a brief quick start guide for running GeoGraph and GeoGraph Agent Farm. It is highly recommended to follow the latest instructions available online at the bottom of the following pages:At first run GeoGraph, by performing the following steps:
Then, to run the GepGraph Agent Farm do as follows:
References
|
Downloads >