Page 211 - Kaleidoscope Academic Conference Proceedings 2024
P. 211

Innovation and Digital Transformation for a Sustainable World




                                                                  in Proceedings of the IEEE/CVF conference on
                                                                  computer vision and pattern recognition, 2019, pp.
                                                                  2327–2336.

                                                               [5] M. U. Lokumarambage, V. S. S. Gowrisetty, H. Rezaei,
                                                                  T. Sivalingam, N. Rajatheva, and A. Fernando,
                                                                  “Wireless end-to-end image transmission system using
                                                                  semantic communications,” IEEE Access, 2023.

                                                               [6] J. Li, D. Li, C. Xiong, and S. Hoi, “Blip:
                                                                  Bootstrapping language-image pre-training for unified
                                                                  vision-language understanding and generation,” in
                                                                  International conference on machine learning. PMLR,
           Figure 10 – Data requirement for end to end image transfer  2022, pp. 12 888–12 900.

           between reducing data volume and preserving image fidelity.
                                                               [7] C. Mou, X. Wang, L. Xie, Y. Wu, J. Zhang, Z. Qi,
           The improvements in performance and resource usage are
                                                                  and Y. Shan, “T2i-adapter: Learning adapters to dig
           among the most promising outcomes of integrating semantic
                                                                  out more controllable ability for text-to-image diffusion
           communication into future networks.
                                                                  models,” in Proceedings of the AAAI Conference
                                                                  on Artificial Intelligence, vol. 38, no. 5, 2024, pp.
                4.  CONCLUSION AND FUTURE WORK
                                                                  4296–4304.
           In conclusion, our paper aims to advance SC by incorporating  [8] M. Z. Hossain, F. Sohel, M. F. Shiratuddin, and H. Laga,
           multi-modality, leveraging sophisticated deep learning  “A comprehensive survey of deep learning for image
           models such as BLIP and T2I adapters. Our thorough     captioning,” ACM Computing Surveys (CsUR), vol. 51,
           investigation, employing concise captions alongside varied  no. 6, pp. 1–36, 2019.
           second modes such as line art, canny edge, and depth
           map, has revealed the efficacy of line art as a superior  [9] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P.
           solution for image communication. This study represents a  Simoncelli, “Image quality assessment: from error
           significant contribution to the exploration of multi-modal  visibility to structural similarity,” IEEE transactions on
           approaches in enhancing SC systems within academic     image processing, vol. 13, no. 4, pp. 600–612, 2004.
           discourse. Initially, captions served as the primary mode
                                                              [10] A. Hertzmann, “Why do line drawings work? a realism
           to optimize data transmission and convey image content.
                                                                  hypothesis,” Journal of Vision, vol. 21, no. 9, pp.
           However, recognizing the structural significance inherent in
                                                                  2029–2029, 2021.
           images, we introduced line art as a supplementary mode.
           Looking ahead, we contemplate the integration of a third
           mode to address the crucial aspect of color consistency,
           thereby ensuring faithful representation and minimizing
           disparities between generated and original images.

                            REFERENCES

            [1] X. Luo, H.-H. Chen, and Q. Guo, “Semantic
               communications: Overview, open issues, and future
               research directions,” IEEE Wireless Communications,
               vol. 29, no. 1, pp. 210–219, 2022.

            [2] D. Gündüz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang,
               A. Yener, K. K. Wong, and C.-B. Chae, “Beyond
               transmitting bits: Context, semantics, and task-oriented
               communications,” IEEE Journal on Selected Areas in
               Communications, vol. 41, no. 1, pp. 5–41, Jan. 2023.
            [3] W. Yang, H. Du, Z. Q. Liew, W. Y. B. Lim, Z. Xiong,
               D. Niyato, X. Chi, X. Shen, and C. Miao, “Semantic
               communications for future internet: Fundamentals,
               applications, and challenges,” IEEE Communications
               Surveys Tutorials, vol. 25, no. 1, pp. 213–250, 2023.

            [4] G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao,
               “Semantics disentangling for text-to-image generation,”




                                                          – 167 –
   206   207   208   209   210   211   212   213   214   215   216