Page 107 - Kaleidoscope Academic Conference Proceedings 2024
P. 107

Innovation and Digital Transformation for a Sustainable World




           The coefficient a in this case of GPU is dependent upon the  In Table 1, we have given the performance sequence of all
           matrix size and depth in the case of GPU. At a particular  three architectures.  This table shows the comparison of
           depth, the coefficient is different.               runtimes at a particular matrix size (i.e. at a given number of
                     from 0 to 8 Qubits                       qubits) at depth 1. Here, depth means the number of times
                                                              the matrices are getting multiplied together. We observe that
                                              
                        = (5.44 ∗ 10 −8  ∗ 2 3    + (2 2    ∗ (2 − 1))) ∗     (5)  GPU is better in most of the cases and Alveo is better at a
                                                              few sizes (i.e. 8, 9, and 10). It is also worth mentioning
                     from 9 to 16 Qubits                      that at lower sizes CPU runtime is almost equal to GPU.
                                                              In Figure 11, we have plotted the computational runtime vs
                          −15
                                    2  
                                           
                               3  
                      = (4.76∗10  ∗2 +(2 ∗(2 −1)))∗  +1.8545 (6)
                                                              Table 1 – Performance of CPU, GPU, and Alveo with
                                                              increasing qubit size for the fixed depth of 1
                                                                       Qubits (  )  Performance Sequence
                                                                           1         GPU, CPU, Alveo
                                                                           2         GPU, CPU, Alveo
                                                                           3         GPU, CPU, Alveo
                                                                           4         GPU, CPU, Alveo
                                                                           5         GPU, CPU, Alveo
                                                                           6        CPU ≈ GPU, Alveo
                                                                           7         GPU, CPU, Alveo
                                                                           8         Alveo, CPU, GPU
                                                                           9         Alveo, CPU, GPU
                                                                          10         Alveo, GPU, CPU
                                                                          11         GPU, Alveo, CPU
                                                                          12         GPU, Alveo, CPU
                                                                       13 onwards    GPU, Alveo, CPU

                                                              increasing matrix size using the same data used in Table 1
                                                              to establish the performance sequence. Similarly, Figure 12
                                                              shows the computational runtime for each matrix size and
                      Figure 9 – Performance of GPU
                                                              with increasing depth (from 1 to 21).
                                                              In Table 2, the performance sequence is given with increasing
                                                              depth. It is observed that in most of the cases (i.e sizes) the
                                                
                          = (1.4093 ∗ 10 −10  ∗ 2 3    + (2 2    ∗ (2 − 1))) ∗    (7)  GPU performs better than the other two architectures (CPU
                                                              and Alveo). But it is also worth mentioning the performance
           The coefficient a in this case is independent of the matrix size  behaviour at some sizes, for example at qubit size 8 (Table 2),
           and depth in case of Alveo as well.                Alveo performed the best for the first time and CPU performed
                                                              better than GPU.
                                                              Now, let us look at Figure 13, it shows the computational
                                                              runtime versus depth of GPU and CPU (the Alveo runtime
                                                              has to be excluded because its runtime was in a different
                                                              range). At this qubit size (i.e 7), both of the architectures
                                                              performed equally good (note that the scaling of y-axis i.e the
                                                              time axis has been done because the runtime range is of order
                                                                6
                                                              10 ).
                                                              In Figure 14, we have plotted the performance of the
                                                              architectures vs depth, the Alveo is again better than CPU
                                                              and GPU, also GPU is better than CPU after 7 depths. But
                                                              at the same time, this trend will not followed after further
                                                              increasing the depth. This is evident if one extrapolates the
                                                              lines further.
                                                              Similarly, in Figure 15 we have plotted the performance of
                                                              the architectures(only for GPU and Alveo) vs depth for 10
                                                              qubits, we have to exclude the CPU data from this plot to
                                                              clearly show the crossover between Alveo and GPU. The
                                                              performance crossover occurred after 5 depths.

                   Figure 10 – Performance of Alveo card




                                                           – 63 –
   102   103   104   105   106   107   108   109   110   111   112