Page 190 - Kaleidoscope Academic Conference Proceedings 2020

P. 190

2020 ITU Kaleidoscope Academic Conference

Most recent works, [8], [13] and [14] of DCNN tend to layers; (3) we adopt a hybrid L1/L2/Lp object function and
maximize objective or subjective image quality by various study its effectiveness. As shown in Figure 2, from a test
network architecture designs or different training strategies. image of the Urban100 [23] data set, our BSR achieves better
However, they tend to ignore the inherent features of each visual results compared with state-of-the-art methods.
input image without considering the texture variance. The
trained model would be biased if the input samples have The rest of the paper is organized as follows. In Section 2,
imbalanced statistical characteristics [15], [16], [17]. relevant background and literature on SR are depicted. We
describe the proposed BSR in Section 3. Experimental
As illustrated in Figure 1 (a) – (d), the gradient distribution results are presented and analyzed in Section 4, and we
from random samples of a DIV2K [19] data set is calculated, conclude the paper in Section 5.
where we partition the entire distribution into 16 evenly
distributed intervals. Although the exact distributions of ×2 2. RELATED WORK
and ×4 scaling with 48×48 and 60×60 patch sizes are
different, the following observation holds: the gradient of The sole purpose of SISR is to find an accurate mapping
DIV2K is not evenly distributed and the patches featuring between LR and HR images. Numerous SR methods have
smaller gradients account for the majority. In the training been studied in the computer vision community, and can be
process, the size of input patch is mainly selected as 48 and classified into three categories [24]: interpolation-based,
60. This is partially because DIV2K mainly contains 2K reconstruction-based, and learning-based.
(resolution of 1920×1080 and above) images which feature
content of plain textures such as sky, cloud, and As pioneering work in the deep-learning-based method,
monochromatic objects. From Figure 1 (e), we can see that SRCNN [5] learns the mapping from LR to HR images in an
typical patches with different gradients interval values of 1, end-to-end manner, and achieves superior performance
4, 8, and 12, where 8 and 12 represent complex and against previous interpolation and reconstruction based
repeatedly artificial patterns. Therefore, if we train a neural works.
network with too many samples from this “biased” data set,
it would inevitably tend to demonstrate more fuzzy HR EDSR [25] won the 2018 New Trends in Image Restoration
images than the network trained with samples full of texture and Enhancement (NTIRE) [26] competition mainly due to
details. the removal of batch normalization (BN) layers, where BN
restrains the scale of feature space via a trainable scale factor
Another disadvantage of traditional DCNN of SR is the and translation factor which needs to be avoided in SR
missing of multiple feature scales. As the neural networks mapping.
get deeper, higher level of abstraction features become
dominant which lead to blur or even unpleasant textures in
the final reconstructed images. This is because in essence,
low-level vision tasks such as SR are different from high-
level tasks such as classification or object detection, where
higher abstraction features are preferable for a final decision.
Therefore, LR images which contain most low-frequency
information should be forwarded and leveraged to generate
the final HR outputs.

Finally, in most DCNNs, the optimization target is typically HR Bi-cubic EDSR
-/- 21.50/.3123 21.87/.4291
the minimization of the mean squared error (MSE) between
the recovered and ground truth HR images, which helps to
maximize PSNR, as the goal of SR is to output a scaled-up
image as close to HR as possible. However, commonly used
object functions pay more attention to pixels with larger
absolute values as they introduce larger variance in final loss
calculation. It is a consensus that L1 norm leads to sharper
edges than L2 norm as L1 has a bigger loss than L2 when the Img_074
difference is small [20], [21], [22].
SRMDNF RCAN BSR
In this work, we propose a balanced super-resolution 21.84/.3738 23.23/.5723 23.42/.6072
framework (BSR), and according to our knowledge, this is
the first work in the literature to study imbalance problem in Figure 2 – Visual results comparison with bi-cubic
SR DCNNs. The main contributions are three-fold: (1) we degradation
propose an efficient random filter sampling method to form
a balanced training batch; (2) we propose a multi-scale SRGAN [27], which is a pioneering piece of work of
feature map with spatial attention mechanism network applying a generative adversarial network (GAN) and
architecture, which consists of around 240 convolution perceptual loss in SR, targets recovering photo-realistic

– 132 –

185 186 187 188 189 190 191 192 193 194 195