Difficulties in finding defects for developers
From HPCBugBase
There are several factors that make defects harder to identify from developers' point of view.
- Defects are dependent on environment: Since HPC code is often required to work in various hardware and software environments, portability is important. If a defect does not cause a failure on the specific environment the developer is working, the existence of the problem may not be detected it at all by testing.
- Defects are dependent on executions If a failure is detected under specific runtime conditions such as input data, parameters and the number or processes/threads, extensive testing is necessary to detect the existence of a defect. A similar situation is that the behavior of the program is non-deterministic due to concurrency, in which case finding a defect may need a lot of executions.
- Validation is hard: Testing and validating the code is a means to know whether something is wrong with the code. In a simple situation, correctness of the program is often validated by checking the output against known correct values. When the correct answer is not known, validating correctness becomes harder. In some application areas such as numerical analysis and random simulation, the output always contains some numerical errors which are hard to verify whether it is reasonable.
- Performance defects are hard to detect: While performance problems in HPC code is considered a defect even if the output is correct, in practice few programs are optimized to squeeze out the last drop of performance from a particular architecture. For many HPC stakeholders it does not matter as long as the code runs fast enough to provide useful output within time and resource constraints. Even if the program is unacceptably slow, some problems are just inherently hard to parallelize. Therefore, it is a difficult judgment whether the code has a performance problem which can be fixed with a known reasonable method.
