Page 104 - Kaleidoscope Academic Conference Proceedings 2024
P. 104
2024 ITU Kaleidoscope Academic Conference
Figure 3 – Quantum Circuit and its Quantum Emulations are
Matrix-Vector and Matrix Matrix Multiplications
Figure 2 – Runtime in units vs Quantum Circuit Depth
Complexity
of 2x1, which scales up exponentially as we increase the
number of qubits i.e the matrix size (2 ∗ 2 ) and state vector
size (2 ∗ 1). Once the code optimization is done on the
backend, then it is a hardware architecture of the classical
computation that needs to be changed to get the performance
enhancement, i.e. the GPU and the ALVEO Cards i.e. HPC
Cards on our end (3). In other research works, CMOS circuit
emulators for quantum computing have been proposed (4),
but these hardware implementations are not yet commercially
available. In the near term, using classical computing
hardware to emulate quantum computation remains a viable
solution.
The depth of the quantum circuit is equivalent to the
number of matrices used in the multiplications in quantum
emulations. The complexity of increasing the quantum circuit
depth i.e matrix multiplication depth is linear for CPU,
but how much is the slope of the runtime with increasing
depth lesser in the case of the GPU and ALVEO, Also how
does the complexity of the runtime vary with increasing the
3
number of qubits on ALVEO and GPU, which is ( )
for CPU? Even if the complexity of the runtime remains
the same on the accelerator cards the exact equation of the Figure 4 – Internal Architecture of CPU
complexity will have lower values on GPU and ALVEO Cards
Once the required matrix elements are in the cache, they
owing to the customization of the hardware architecture as
can be loaded into CPU registers, which are small, fast
per the application on ALVEO Cards and parallelism on
storage locations directly accessible by the CPU cores. The
GPU Cards. This paper aims to benchmark the how and
CPU’s instruction decoder and execution units handle the
exact mathematical equation for variable qubit size and the
loading of data from cache into registers. This process is
quantum circuit depth, which can be further used to establish
typically controlled by assembly-level instructions generated
a bottleneck for the qubit size and the quantum circuit on a
by the compiler or software. Once the matrix elements are
present supercomputer for quantum emulations. Now let’s
loaded into CPU registers, the actual multiplication operation
move on to the exact dataflow for matrix multiplications on
can begin. The EPYC 7742 CPU features multiple cores,
CPU, GPU, and ALVEO Cards(5).
each capable of executing instructions independently. These
A clear pictorial representation between the quantum
cores can work in parallel, allowing for efficient processing
emulations and actual quantum circuits is shown in Fig. 3.
of matrix multiplication tasks. The CPU’s SIMD(Single
Instruction, Multiple Data) units can be leveraged for parallel
2. MATRIX MULTIPLICATION ON CPU
computation. SIMD instructions enable the execution of the
In CPU, first the data (Matrix elements) is loaded to the same operation on multiple data elements simultaneously,
cache L1 from CPU RAM(Random Access Memory) by which is beneficial for matrix multiplication. The CPU
executes instructions generated by the software or compiler to
using multiple data buses for efficient parallel computation.
– 60 –