Hidden Serialization in Library Functions

From HPCBugBase

Jump to: navigation, search

HPCBugBase Menu

Submit feedback


Overview


Index


Index by Languages

Contents

[edit] Fault Description

Many library functions are originally designed to be used in a sequential program. To make them work correctly in a parallel context, the implementation sometimes contains internal serialization. If such a function with internal serialization is put in a parallel context, it can cause a performance bottleneck because only one process can execute it at one time. The problem tends to surface more frequently in shared-memory (multi-threaded) architectures than distributed architectures.

The rand() function in glibc is protected by pthread lock, so the processes/threads running on one system can enter the function one by one. Therefore, the call of rand() inside the parallel loop causes performance degradation. This type of performance bottleneck is difficult to identify for novices because the knowledge of the library implementation is necessary to understand the cause.

Here is an example in MPI. (This is part of the code for approximating the number Pi).

for (i=0; i<n; i+=size) {
  x = rand() / (double)RAND_MAX;
  y = rand() / (double)RAND_MAX;
  if (x*x + y*y < 1) count = count + 1;
}
MPI_Reduce (..., MPI_SUM, ...);

While this is an "embarrassingly parallel" problem (no communication is necessary in the main loop), the calls of rand() can prevent the program from scaling depending on the execution environment.

[edit] Statistics (Frequency)

[edit] Other Findings and Contexts

Since the traditional implementation of Unix rand() function with the linear congruential method is known to be inappropriate for serious use, real HPC applications should choose the library that satisfies their need. Developers should examine the characteristics of the PRNG routine they use.

Pages referring to this entry: Main Page Hidden Serialization 

Personal tools