The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Many high-performance computing (HPC) sites extend their clusters to support Hadoop MapReduce for a variety of applications. However, HPC cluster differs from Hadoop cluster on the configurations of storage resources. In the Hadoop Distributed File System (HDFS), data resides on the compute nodes, while in the HPC cluster, data is stored on separate nodes dedicated to storage. Dedicated storage offloads...
MapReduce is a popular computing model for parallel data processing on large-scale datasets, which can vary from gigabytes to terabytes and petabytes. Though Hadoop MapReduce normally uses Hadoop Distributed File System (HDFS) local file system, it can be configured to use a remote file system. Then, an interesting question is raised: for a given application, which is the best running platform among...
Service assurance for the telecom cloud is a challenging task and is continuously being addressed by academics and industry. One promising approach is to utilize machine learning to predict service quality in order to take early mitigation actions. In previous work we have shown how to predict service-level metrics, such as frame rate for a video application on the client side, from operational data...
Migrating resources is a useful tool for balancing load in a distributed system, but it is difficult to determine when to move resources, where to move resources, and how much of them to move. We look at resource migration for file system metadata and show how CephFS's dynamic subtree partitioning approach can exploit varying degrees of locality and balance because it can partition the namespace into...
Hadoop is an open source tool. It enables the processing and distributed storage of big data sets using commodity cluster computing. With Hadoop occupying a core status in the current processing era, its performance optimization is also being heavily studied. This paper introduces one such method to improve Hadoop cluster performance by using a Remote Procedure Call (RPC), rpcbind service of the Linux...
This research analyzes the performance of a distributed cloud broker in a live traffic scenario, utilizing a government owned private federated cloud. The paper explores the use case when a cloud broker assists the provisioning of services to geographically distributed data centers which have volunteered to federate and expose their utilization metrics to each other through the cloud broker. The experimental...
The research analyzes the functioning of an Autonomic Cloud Broker (ACB) in a real life scenario utilizing a Federated Cloud Infrastructure. The paper analyzes two important use cases that exist while a Cloud Broker enables the provision of services from multiple cloud providers to simultaneously demanding cloud users. Varying, real time load conditions are generated in a live use case on a private...
In this paper, we investigate the scalability of three communication architectures for advanced metering infrastructure (AMI) in smart grid. AMI in smart grid is a typical cyber-physical system (CPS) example, in which large amount of data from hundreds of thousands of smart meters are collected and processed through an AMI communication infrastructure. Scalability is one of the most important issues...
Replica management has become a hot research topic in data grid, It improves the availability and fault tolerant of data in grid computing environment. Based on the deep research on the mainstream replica management technology in traditional data grid, we propose a dynamic replica management strategy. Our replica management strategy consists of the creation method of dynamic replica that can automatically...
Data Grid provides geographically distributed storage resources for large-scale data-intensive applications that generate large data sets. Because data is the important resource in data grids, an efficient management is needed to minimize the response time of applications. Replication is typically used in data grids to improve access time and to reduce the bandwidth consumption. In this paper, we...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.