Applied Mathematics & Information Sciences

Author Country (or Countries)



Emerging heterogeneous multi-core systems, such as the IBM Cell BE, are deployed with multiple hardware accelerators to enhance the performance of the systems. In these systems, each accelerator includes its own local memory where software controlled DMA transfers are provided to utilize the memory bandwidth. Two important software controlled management methods (direct buffering and software controlled cache) are applied in regular and irregular references, respectively. The run-time coherence maintenance is performed when the same global memory location is referenced by both software controlled cache and buffer.This paper proposes a BCDR framework to exploit data reuse for buffers and software cache. The framework includes buffer2buffer data reuse optimizations, buffer2cache/cache2buffer data reuse optimizations and buffered array identification. For buffer2buffer data reuse optimizations, the Retaining Buffered Data technique and pipelining optimization are given to optimize critical region after a basic data reuse optimization. To make use of the opportunity induced by buffer2buffer optimizations, the buffer2cache/cache2buffer data reuse optimizations are presented to improve the performance of applications with irregular accesses. Furthermore, a buffered data identification algorithm is presented to increase the precise of global data-flow analysis for the coherence maintenance between SCC and buffers. The experimental results show that our optimizations expose many opportunities for both buffer and cache. The transferred data amount between the local store and global memory is reduced by 16.35% on average for all cases. Our optimizations further reduce 19.7% of the average execution time. In addition, the run-time coherence maintenance overhead is reduced significantly.

Digital Object Identifier (DOI)