TY - GEN
T1 - Experimental analysis of SMP scalability in the presence of coherence traffic and snoop filtering
AU - Al-Mouhamed, Mayez A.
AU - Daud, Khaled A.
PY - 2012
Y1 - 2012
N2 - Commodity multi-core SMPs may generate an enormous amount of coherency traffic. However, the impact of coherence traffic and snoop filtering on parallel program scalability has not attracted sufficient attention. We experimentally analyze the shared data access patterns of four typical applications having different memory layout. An OpenMp optimized execution model is derived for each application with emphasis on data dependencies and implied coherence messages. Using an 8-core SMP we present the obtained speedups versus change in the number of cores and problem scale. A discussion of potential limitation on scalability due to the application or SMP is presented. To assess the coherence behavior and its impact on scalability of parallel programs, a synthetic benchmark which alternates the data block ownership among two cores of the same or different processors is presented. It is found that coherence overheads including snoop filtering are responsible of significant limitation on parallel program scalability. For 8-core SMPs, speedup can be reduced by factors of 2.5 and 5 for row-major and column-major access patterns as compared to the use of private data, respectively. A truly parallel coherence protocol implementation is needed to provide truly scalable shared-memory model.
AB - Commodity multi-core SMPs may generate an enormous amount of coherency traffic. However, the impact of coherence traffic and snoop filtering on parallel program scalability has not attracted sufficient attention. We experimentally analyze the shared data access patterns of four typical applications having different memory layout. An OpenMp optimized execution model is derived for each application with emphasis on data dependencies and implied coherence messages. Using an 8-core SMP we present the obtained speedups versus change in the number of cores and problem scale. A discussion of potential limitation on scalability due to the application or SMP is presented. To assess the coherence behavior and its impact on scalability of parallel programs, a synthetic benchmark which alternates the data block ownership among two cores of the same or different processors is presented. It is found that coherence overheads including snoop filtering are responsible of significant limitation on parallel program scalability. For 8-core SMPs, speedup can be reduced by factors of 2.5 and 5 for row-major and column-major access patterns as compared to the use of private data, respectively. A truly parallel coherence protocol implementation is needed to provide truly scalable shared-memory model.
KW - HPC
KW - distributed-memory
KW - parallel programming
KW - performance evaluation and speedup
KW - shared-memory system
UR - http://www.scopus.com/inward/record.url?scp=84870446835&partnerID=8YFLogxK
U2 - 10.1109/HPCC.2012.21
DO - 10.1109/HPCC.2012.21
M3 - Conference contribution
AN - SCOPUS:84870446835
SN - 9780769547497
T3 - Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012
SP - 81
EP - 88
BT - Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012
ER -