RadioImagerGPU 1.0
|
This part presents the performance analysis of UVW and Imaging computations using GPU and CPU implementations (both optimized and non-optimized). The plots below compare the computation times across different numbers of elements using a log scale for UVW, Imaging, and Total computation times. Additionally, the right-hand plots provide a linear scale view for the number of elements where GPU starts to become better than optimized CPU.
The performance comparison for 1 direction shows that the GPU implementation starts to outperform the optimized CPU implementation as the number of elements increases. However, for a smaller number of elements, the overhead of data transfer makes the GPU slower.
For 5 directions, the GPU shows more consistent improvements over the CPU implementations. The parallelization benefits of the GPU become more pronounced, making it significantly faster for larger numbers of elements.
Same as for 5 directions but even more pronounced.
Below is a video showing how different numbers of arrays (elements) affect the resulting image (images are shown for different directions). One can use the GPU-accelerated software to quickly test their PSFs for different array configurations, observational directions, and potential artifact identification.
The NVIDIA profiler was used on the GPU-accelerated software with the following command:
This run was performed with 2000 elements and 5 directions. The resulting image below shows the total duration of each GPU activity:
The performance data is provided in CSV files, gpu_timings.csv
and cpu_timings.csv
. Below is the format of these files (all times are in ms):
num_elements | num_directions | uvw_time | imaging_time |
---|---|---|---|
10 | 1 | 111 | 36 |
25 | 1 | 91 | 37 |
... | ... | ... | ... |
num_elements | num_directions | optimization | uvw_time | imaging_time |
---|---|---|---|---|
10 | 1 | optimized | 0.15 | 12.61 |
10 | 1 | non-optimized | 0.64 | 12.70 |
... | ... | ... | ... | ... |