Page 210 - Kaleidoscope Academic Conference Proceedings 2024
P. 210

2024 ITU Kaleidoscope Academic Conference
























            Figure 5 – Average SSI for each mode of communication
                                                                    Figure 7 – Canny edge of the original image
















                        Figure 6 – Original image

           3.2 Data Reduction
                                                                      Figure 8 – Line art of the original image
           Data reductions were computed for a solitary image across  map of the original image.
           several categories: caption only, caption + line art, caption +
           canny edge, and depth map + caption. The image measured
           728x492 pixels, totaling 358,176 pixels. With 8 bits per pixel,
           each channel required 2,865,408 bits, summing to 8,596,224
           bits for all three channels (RGB). The original image is shown
           in Figure 6.
           The caption generated from the model, "man in a hat holding
           two guns in his hands," requires 312 bits of data. This
           calculation was based on the standard ASCII encoding, which
           uses 8 bits to represent each character. Therefore, with 39
           characters in the caption, the total number of bits needed is
           39 characters * 8 bits/character = 312 bits.
           Canny edge detection, applied to an image of (728, 492)

           pixels, demands 358,176 bits. This succinct representation  Figure 9 – Depth map of the original image
           encapsulates the data size necessary for accurately depicting
           detected edges, a critical aspect for image analysis. The canny
                                                              Figure 10 illustrates the data requirements for various
           edge of the original image is depicted in Figure 7.
                                                              modes of end-to-end image data transfer. This visualization
           Producing line art for an image with dimensions of (728,
                                                              underscores the potential sustainability offered by semantic
           492) pixels requires 2,865,408 bits of data. This dataset,
                                                              communication compared to conventional methods. By
           encompassing 358,176 pixels at 8 bits per pixel, faithfully
                                                              selectively transmitting only meaningful data, semantic
           conveys the line art details. Figure 8 illustrates the resulting
                                                              communication reduces the overall data quantity required
           line art derived from the original image.
                                                              for communication. Additionally, incorporating line art as
           Generating a depth map for an image measuring (728, 492)
                                                              a second mode yields improved performance metrics such
           pixels requires 2,865,408 bits of data. This data comprises  as MSE, PSNR, and SSI, while also contributing to a
           358,176 pixels at 8 bits per pixel, accurately representing  significant decrease in the data used for communication. Our
           depth information for each pixel. Figure 9 displays the depth  results suggest that line art achieves an ideal equilibrium

                                                          – 166 –
   205   206   207   208   209   210   211   212   213   214   215