Page 192 - Kaleidoscope Academic Conference Proceedings 2020
P. 192

2020 ITU Kaleidoscope Academic Conference




           where       and       are  the  vertical and  horizontal  derivative   The working horse of the proposed BSR is multiple FMGs.
                            
           approximation  respectively,  and                              is  the  total   Each FMG consists of five residual blocks (RB), while each
           magnitude of gradient. A Sobel operator is a concise way to   RB includes several layers, as shown in Figure 5 below.
           count  texture  information  content.  We  use  the  sum  of
           absolute  value  to  calculate                              instead  of  traditional
           squared-root  to  reduce  the effect  of  outliers  and  speed  up   Conv  Activation  Conv  Scale
           calculation. Gradients for all pixels from all three channels           RB   RB    RB    RB    RB
           (red, green, blue) of the current patch are summed together     +
           to obtain patch’s PIC. Please note that for each pixel of the   Figure 5 – Left: One residual block (RB). Right: One
           input patch, there are three                            values calculated for Red,   FMG
           Green,  and  Blue  channels,  individually.  We  select  the
           maximum value of these three as the input to compute the L1   1)  Shallow feature extraction
           norm  of  the  entire  patch  for  it  is  equivalent  for  us  to
           distinguish PIC in terms of channel. We use L1 norm instead   The shallow feature extraction block is a single convolution
           of L2 norm due to its robustness. As L2 norm squares the   layer to generate low-level features      :
                                                                                                
           error, it is more sensitive to outliers in the training data set.
                                                                                    =      (     )         (4)
                                                                                                
                                                                                            
                                                                                   
           During the batch construction process, suppose each batch
           contains N patches from M randomly selected LR candidate   where       represents the convolution layer.
           images where M is usually smaller than or equal to N. To           
           obtain  an  evenly  distributed  probability  of  texture   2)  Feature mapping group
           information measured in PIC for the current batch, the entire
           PIC  distribution  is  split  into      intervals      = [0      …      ],   As  shown  in  Figure  4,  FMGs  are  cascaded  to  obtain  LR
                                                    1
                                                             
           which refer to the statistical results in Figure 1. Variable V    features at different scales:
           represents the number of patches currently selected. The PIC
           of N balanced patches should be uniformly distributed.                       =1…      =                   (     )      (5)
                                                                                             0
           The entire RFS can be summarized in Table 1.       where      and                    are feature scale factor and FMG mapping
                                                              function.
                        Table 1 - Flow chart of RFS
                                                              The basic idea of FMGs is to provide balanced input for the
            Input: M – number of randomly selected LR candidate images
                       N – number of patches for each batch   up-sampling  block  of  the  last  stage.  The  lowest  level  of
                       K – set of PIC distribution interval   feature information       is  derived  from the input  of  FMG0
                                                                               0
                       T – training data set                  which is the direct output of shallow feature extraction block,
            1.  Initialize sampled vector V = [NULL … NULL]1×k. Randomly   while the highest level of feature information       is the input
                                                                                                        
            select M low resolution images from training data set T.   of the reconstruction block.
            2.  While V ≠ K:
            2.1) Randomly crop each input image to generate one patch p0,   As DCNN goes deeper, low-level information usually gets
            p1 … pm-1                                         diminished, and this is why most recent DCNNs for SISR
            2.2) Compute PIC for each patch as equation (1), and output PIC   always have a long skip connection between the low-level
            vector [PIC0, PIC1 … PICm]                        feature stage and final reconstruction stage. We call this scale
            2.3) For i = 0: k-1:                              imbalance when the texture information from LR gradually
                       If PICi ∈ interval Ni and Vi< Ki:
                           Put current patch into batch       fades out as the number of convolution layers grows. The
            Output: Batch                                     output from very deep convolution layers contains abstract
                                                              information which is good for high-level vision tasks such as
           3.3    Network Architecture                        object  detection  and  classification,  but  less  important  for
                                                              low-level  vision  tasks  such  as  SR  where  texture
                                                              reconstruction plays a vital role.
           The overall network architecture is illustrated in Figure 4,
           which consists of three main building blocks: shallow feature
           extraction,    feature    mapping    group    (FMG),    and    Many DCNNs, such as [6], [28], [29], [30], could recover the
                                                              contour  of  the  objects  but  lost  the  details  around,  which
           reconstruction, respectively.
                                                              causes unpleasant visual effects such as blur. Works such as
                                                              RCAN [6] solve imbalance among feature channels in the
             Shallow feature
              extraction  Feature mapping group  Reconstruction  same layer via a channel attention mechanism. The proposed
                                          α 0   Spatial attention
                                          α 1                 BSR  focuses  on  the  imbalance  in  full-scale  space  and
                 Conv  FMG 0  FMG 1  FMG N    α n  Conv  Upsampling  Conv  forwards all the previous lower level information to the final
            LR                                                stage.
                                                        HR
                      Figure 4 – Network architecture
                                                              Each FMG consists of residual blocks and skip connections
                                                              (SC) as shown in Figure 5. Each residual block includes two




                                                          – 134 –
   187   188   189   190   191   192   193   194   195   196   197