Page 190 - Kaleidoscope Academic Conference Proceedings 2020
P. 190

2020 ITU Kaleidoscope Academic Conference




           Most  recent  works,  [8],  [13]  and  [14]  of  DCNN  tend  to   layers; (3) we adopt a hybrid L1/L2/Lp object function and
           maximize objective or subjective image quality by various   study  its effectiveness.  As  shown  in  Figure  2,  from  a  test
           network architecture designs or different training strategies.   image of the Urban100 [23] data set, our BSR achieves better
           However, they tend to ignore the inherent features of each   visual results compared with state-of-the-art methods.
           input image  without considering  the texture variance. The
           trained  model  would  be  biased  if  the  input  samples  have   The rest of the paper is organized as follows. In Section 2,
           imbalanced statistical characteristics [15], [16], [17].    relevant background and literature on SR are depicted. We
                                                              describe  the  proposed  BSR  in  Section  3.  Experimental
           As illustrated in Figure 1 (a) – (d), the gradient distribution   results  are  presented  and  analyzed  in  Section  4,  and  we
           from random samples of a DIV2K [19] data set is calculated,   conclude the paper in Section 5.
           where  we  partition  the  entire  distribution  into  16  evenly
           distributed intervals. Although the exact distributions of ×2    2.  RELATED WORK
           and  ×4  scaling  with  48×48  and  60×60  patch  sizes  are
           different,  the  following  observation  holds:  the  gradient  of   The  sole  purpose of  SISR is  to  find an accurate mapping
           DIV2K is not evenly distributed and the patches featuring   between LR and HR images. Numerous SR methods have
           smaller  gradients account  for the  majority.  In the  training   been studied in the computer vision community, and can be
           process, the size of input patch is mainly selected as 48 and   classified  into  three  categories  [24]:  interpolation-based,
           60.  This  is  partially  because  DIV2K  mainly  contains  2K   reconstruction-based, and learning-based.
           (resolution of 1920×1080 and above) images which feature
           content  of  plain  textures  such  as  sky,  cloud,  and   As  pioneering  work  in  the  deep-learning-based  method,
           monochromatic objects. From Figure 1 (e), we can see that   SRCNN [5] learns the mapping from LR to HR images in an
           typical patches with different gradients interval values of 1,   end-to-end  manner,  and  achieves  superior  performance
           4,  8,  and  12,  where  8  and  12  represent  complex  and   against  previous  interpolation  and  reconstruction  based
           repeatedly artificial patterns. Therefore, if we train a neural   works.
           network with too many samples from this “biased” data set,
           it  would  inevitably  tend  to  demonstrate  more  fuzzy  HR   EDSR [25] won the 2018 New Trends in Image Restoration
           images than the network trained with samples full of texture   and Enhancement (NTIRE) [26] competition mainly due to
           details.                                           the removal of batch normalization (BN) layers, where BN
                                                              restrains the scale of feature space via a trainable scale factor
           Another  disadvantage  of  traditional  DCNN  of  SR  is  the   and  translation  factor  which  needs  to  be  avoided  in  SR
           missing of multiple feature scales. As the neural networks   mapping.
           get  deeper,  higher  level  of  abstraction  features  become
           dominant which lead to blur or even unpleasant textures in
           the final reconstructed images. This is because in essence,
           low-level vision tasks such as SR are different from high-
           level tasks such as classification or object detection, where
           higher abstraction features are preferable for a final decision.
           Therefore,  LR  images  which  contain  most  low-frequency
           information should be forwarded and leveraged to generate
           the final HR outputs.

           Finally, in most DCNNs, the optimization target is typically                   HR         Bi-cubic       EDSR
                                                                                          -/-      21.50/.3123 21.87/.4291
           the minimization of the mean squared error (MSE) between
           the recovered and ground truth HR images, which helps to
           maximize PSNR, as the goal of SR is to output a scaled-up
           image as close to HR as possible. However, commonly used
           object  functions  pay  more  attention  to  pixels  with  larger
           absolute values as they introduce larger variance in final loss
           calculation. It is a consensus that L1 norm leads to sharper
           edges than L2 norm as L1 has a bigger loss than L2 when the   Img_074
           difference is small [20], [21], [22].
                                                                                       SRMDNF     RCAN          BSR
           In  this  work,  we  propose  a  balanced  super-resolution              21.84/.3738  23.23/.5723 23.42/.6072
           framework (BSR), and according to our knowledge, this is
           the first work in the literature to study imbalance problem in   Figure 2 – Visual results comparison with bi-cubic
           SR DCNNs. The main contributions are three-fold: (1) we               degradation
           propose an efficient random filter sampling method to form
           a  balanced  training  batch;  (2)  we  propose  a  multi-scale   SRGAN  [27],  which  is  a  pioneering  piece  of  work  of
           feature  map  with  spatial  attention  mechanism  network   applying  a  generative  adversarial  network  (GAN)  and
           architecture,  which  consists  of  around  240  convolution   perceptual  loss  in  SR,  targets  recovering  photo-realistic





                                                          – 132 –
   185   186   187   188   189   190   191   192   193   194   195