The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we describe an algorithm to improve dictionary based lossless data compression on GPGPUs. The presented algorithm uses bit-wise computations and leverages bit parallelism for the core part of the algorithm which is the longest prefix match calculations. Using bit parallelism, also known as bit-vector approach, is a fundamentally new approach for data compression and promising in performance...
In many scientific and computational domains, graphs are used to represent and analyze data. Such graphs often exhibit the characteristics of small-world networks: few high-degree vertexes connect many low-degree vertexes. Despite the randomness in a graph search, it is possible to capitalize on this characteristic and cache relevant information in high-degree vertexes. We applied this idea by caching...
Scientific simulations are moving away from using centralized persistent storage for intermediate data between workflow steps towards an all online model. This shift is motivated by the relatively slow IO bandwidth growth compared with compute speed increases. The challenges presented by this shift to Integrated Application Workflows are motivated by the loss of persistent storage semantics for node-to-node...
The underlying storage of hybrid parallel file systems (PFS) is composed of both SSD-based file servers (SServer) and HDD-based file servers (HServer). Unlike a traditional HServer, an SServer consistently provides improved storage performance but lacks storage space. However, most current data layout schemes do not consider the differences in performance and space between heterogeneous servers, and...
This paper presents dispel4py, a new Python framework for describing abstract stream-based workflows for distributed data-intensive applications. The main aim of dispel4py is to enable scientists to focus on their computation instead of being distracted by details of the computing infrastructure they use. Therefore, special care has been taken to provide dispel4py with the ability to map abstract...
In today's "Big Data" era, developers have adopted I/O techniques such as MPI-IO, Parallel NetCDF and HDF5 to garner enough performance to manage the vast amount of data that scientific applications require. These I/O techniques offer parallel access to shared datasets and together with a set of optimizations such as data sieving and two-phase I/O to boost I/O throughput. While most of these...
Multipath routing has been studied in diverse contexts such as wide-area networks and wireless networks in order to minimize the finish time of data transfer or the latency of message sending. The fast adoption of cloud computing for various applications including high-performance computing applications has drawn more attention to efficient network utilization through adaptive or multipath routing...
Compared with a hash table, a Bloom Filter (BF) is more space-efficient for supporting fast matching though resulting in a controllable and acceptable false positive probability. The space size of the basic BF is predetermined based on the expected number of elements to be stored. However, we cannot predict the scale of a BF space for dynamic sets. The two existing solutions for supporting dynamic...
For systems executing a mixture of different data intensive applications in parallel there is always the question about the impact that each application has on the storage subsystem. From the perspective of storage, I/O is typically anonymous as it does not contain user identifiers or similar information. This paper focuses on the analysis of performance data collected on shared system components...
Key-Value Stores (KVStore) are being widely used as the storage system for large-scale Internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems (PFS) are the dominant storage systems. In this study, we carefully examine the architecture difference and performance characteristics of PFS and KVStore. We propose that it is valuable to utilize...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.