Page 107 - Kaleidoscope Academic Conference Proceedings 2024

P. 107

Innovation and Digital Transformation for a Sustainable World

The coefficient a in this case of GPU is dependent upon the In Table 1, we have given the performance sequence of all
matrix size and depth in the case of GPU. At a particular three architectures. This table shows the comparison of
depth, the coefficient is different. runtimes at a particular matrix size (i.e. at a given number of
from 0 to 8 Qubits qubits) at depth 1. Here, depth means the number of times
the matrices are getting multiplied together. We observe that

= (5.44 ∗ 10 −8 ∗ 2 3 + (2 2 ∗ (2 − 1))) ∗ (5) GPU is better in most of the cases and Alveo is better at a
few sizes (i.e. 8, 9, and 10). It is also worth mentioning
from 9 to 16 Qubits that at lower sizes CPU runtime is almost equal to GPU.
In Figure 11, we have plotted the computational runtime vs
−15
2

3
= (4.76∗10 ∗2 +(2 ∗(2 −1)))∗ +1.8545 (6)
Table 1 – Performance of CPU, GPU, and Alveo with
increasing qubit size for the fixed depth of 1
Qubits ( ) Performance Sequence
1 GPU, CPU, Alveo
2 GPU, CPU, Alveo
3 GPU, CPU, Alveo
4 GPU, CPU, Alveo
5 GPU, CPU, Alveo
6 CPU ≈ GPU, Alveo
7 GPU, CPU, Alveo
8 Alveo, CPU, GPU
9 Alveo, CPU, GPU
10 Alveo, GPU, CPU
11 GPU, Alveo, CPU
12 GPU, Alveo, CPU
13 onwards GPU, Alveo, CPU

increasing matrix size using the same data used in Table 1
to establish the performance sequence. Similarly, Figure 12
shows the computational runtime for each matrix size and
Figure 9 – Performance of GPU
with increasing depth (from 1 to 21).
In Table 2, the performance sequence is given with increasing
depth. It is observed that in most of the cases (i.e sizes) the

= (1.4093 ∗ 10 −10 ∗ 2 3 + (2 2 ∗ (2 − 1))) ∗ (7) GPU performs better than the other two architectures (CPU
and Alveo). But it is also worth mentioning the performance
The coefficient a in this case is independent of the matrix size behaviour at some sizes, for example at qubit size 8 (Table 2),
and depth in case of Alveo as well. Alveo performed the best for the first time and CPU performed
better than GPU.
Now, let us look at Figure 13, it shows the computational
runtime versus depth of GPU and CPU (the Alveo runtime
has to be excluded because its runtime was in a different
range). At this qubit size (i.e 7), both of the architectures
performed equally good (note that the scaling of y-axis i.e the
time axis has been done because the runtime range is of order
6
10 ).
In Figure 14, we have plotted the performance of the
architectures vs depth, the Alveo is again better than CPU
and GPU, also GPU is better than CPU after 7 depths. But
at the same time, this trend will not followed after further
increasing the depth. This is evident if one extrapolates the
lines further.
Similarly, in Figure 15 we have plotted the performance of
the architectures(only for GPU and Alveo) vs depth for 10
qubits, we have to exclude the CPU data from this plot to
clearly show the crossover between Alveo and GPU. The
performance crossover occurred after 5 depths.

Figure 10 – Performance of Alveo card

– 63 –

102 103 104 105 106 107 108 109 110 111 112