Page 103 - Kaleidoscope Academic Conference Proceedings 2024
P. 103

BENCHMARKING MATRIX MULTIPLICATIONS FOR VARIABLE QUBIT SIZE AND
                                                        DEPTH

                                                   1
                                                            1
                                                                                1
                                    1
                                                                    1
                      Md Imam Mazhar ; Abhishek Tiwari ; Nasir Ali ; Britant ; Akshay Patil ; Rahul Kumar Neiwal 2
                      1
                       Embedded Systems Division, Centre for Development of Advanced Computing (C-DAC), Noida
                               2
                                Ministry of Electronics and Information Technology (MeitY), New Delhi
                              ABSTRACT
           To emulate a quantum computation on a classical computer
           i.e. the evolution of the unitary operations on the wave
           function of the particle in quantum mechanics, we have to
           perform unitary matrix and normalized vector multiplications
           in the high-level programming languages of Python, C++,
           Java, etc. Quantum Libraries already available perform the
           matrix-vector multiplication in the backend using the numpy
           libraries of Python like Qiskit or use a C++ wrapper to
           further optimize the runtime it as in Qiskit-Aer Simulators.
           Since a fully functioning fault-tolerant computer is decades
           away, it is in our best interest to design new quantum
           algorithms and develop accelerator test beds for Quantum
           Emulations. All the quantum computer operations can be
           emulated on a classical computer, with the only downside
                                                         3
           being that the matrix multiplications scale up as   (   )  Figure 1 – Cubic Exponential Curve for Runtime as the
           in runtime. In contrast, the quantum computer scales it  number of qubits is increased for Quantum Emulations
                        2
                    2
           up as   (            ), where N = 2 , where N is the matrix
                                         
           dimension, where n is the number of qubits, so the runtime  unitary matrices of 2x2 and 4x4 sizes respectively. Now
           for quantum emulations on the classical computer increases  it is up to the coder on how one will code the operations
           exponentially with increase in number of qubits and increases  of quantum emulations as, i.e it can be done as Kronecker
           linearly with increase in number of depths, complexity wise.  Product between the 2x2 unitary gates, 4x4 CNOT gates
           Though it is not possible to change the exponential index, it  matrices for all qubits at a particular depth and then multiply
           is possible to reduce the runtime for quantum emulations on  the resultant unitary with the input column state vector, or one
           classical computers by use of GPU and Alveo Accelerator  can also follow the approach of unitary matrix - Kronecker
           Cards, and also code optimization on the software side like  product and then state vector multiplication for each depth.
           using a C++ wrapper. In this paper, we will benchmark  But here we are following the prior approach of performing
           the matrix-matrix multiplications on HPC Accelerator Cards  the matrix-matrix multiplications first and then matrix-vector
           varying the qubit size and the depth of the quantum circuit and  multiplications.  Here the Kronecker product requires a
                                                              different time for each architecture, we will not perform the
           provide a universal mathematical equation for the runtime on
           the GPU and Alveo Vector Cards for two variables of qubit  Kronecker product on the CPU, GPU, and ALVEO for now,
           size and quantum circuit depth. So a theoretical limit on qubit  but rather perform and benchmark the time for matrix-matrix
           size and qubit depth exactly can be established for quantum  multiplications by varying the qubit size and the depth (1).
           emulations on present classical supercomputers.    Emulation of Quantum Algorithms using FPGAs has been
                                                              done in (2) but in our work, Alveo card combines three
            Keywords - Qiskit, Qubit, GPU, Alveo Accelerator Cards  essential things: a powerful FPGA for acceleration, which
                                                              has high-bandwidth DDR4 memory banks, and connectivity
                         1.  INTRODUCTION                     to a host server via a high-bandwidth PCIe Gen3x16 link.
                                                              This link is capable of transferring approximately 16 GiB of
           What is a Qubit? A Qubit is a quantum binary unit that can  data per second between the Alveo card and the host.
           take any normalized amplitude and the state for that can be  The qubit size relates to matrix size in emulations on a
           ’0’, ’1’, or ’0 + 1’ or ’0 - 1’, on a 0 and 1 basis, These basis can  classical processor using a quantum library (8) according
           be represented in the form of a column vector with the size as  to (2 ∗ 2 ) where n is the number of qubits, to represent
                                                                    
                                                                        
               
           [2 , 1], where n is the number of qubits. Now any quantum  the superposition on a qubit or a unitary gate operation on
           algorithm can be decomposed into a set of universal quantum  a qubit, we require a matrix size of 2x2 and vector size
           gates of   (  ,   ,   ) and CNOT, which can be represented as


            978-92-61-39091-4/CFP2268P @ITU 2024           – 59 –                                   Kaleidoscope
   98   99   100   101   102   103   104   105   106   107   108