What are HPC defects?
From HPCBugBase
The primary purpose of HPCBugBase is to accumulate knowledge about HPC defects.
Contents |
[edit] HPC defects vs. non-HPC defects
The first question is that what differentiates HPC defects from non-HPC ones. The reason we want to make this distinction in the first place is to focus our efforts. We feel that software defects in general have so many different aspects that if we broaden the scope too much, the content would be diffused among a lot of fragmented information. By concentrating on a particular software domain, i.e., HPC, we hope to provide the knowledge that is actually useful to the community.
Therefore, we want to document defects if they would interest the HPC community. Since such a judgment is subjective, we roughly divide software defects into three classes as follows:
- Those which are clearly characteristic to HPC development. Eg., misuse of HPC language features. We consider the defects in this class as HPC defects.
- Those which are clearly seen in any type of software development. Eg., simple syntax errors. We do not consider the defects in this class as HPC defects.
- Those in between. Important in HPC development, but may be seen in other kinds of software. Eg., problems with concurrency. We consider the defects in this class as HPC defects. An emphasis should be put on the aspects strongly related to HPC.
Unfortunately, the above is not a completely objective definition yet. Our current approach is that when we are not sure, we put an example description in the experience base and ask experts' opinion. We will update the definition in the future when we have enough number of such cases to determine which should/should not be considered as HPC defects.
[edit] Generic vs. specific
- HPC generic
- Language specific
- Machine specific
- Compiler specific
- Application specific
[edit] Algorithmic vs. coding
Our observations show that scientists tend to think there is a difference between a defect associated with underlying scientific models and algorithms, and a defect with implementation.
[edit] Correctness vs. performance
When software defects (or bugs) are discussed, one question that often arises is whether a performance problem is a kind of defects. In generic programming, a program that runs slowly but returns a right output is usually not considered as defective; it may not be optimized, but it's a correct program. In HPC, however, speed is a critical factor. Scientist's primary interest is to obtain a scientific result within the resource allocated. If a program is too slow, or doesn't scale to a large number of processors, they have to reduce the size of the problem, or in the worst case, they have give up.
Since the criteria for whether a program is "fast enough" depend on various context variables, providing a clear definition of performance defects is a difficult task. Intuitively, we consider a program contains a performance defect when a simple alternative solution is known to improve the execution speed significantly. Such a solution can be the use of different language features, restructuring of data, or even the replacement of the algorithm if that is relatively trivial.
