Page 159 - Proceedings of the 2018 ITU Kaleidoscope

P. 159

Machine learning for a 5G future

case. Particularly, the winning solution of the SpaceNet
Round 2 competition, created by Kohei Ozaki, was chosen
[12]. Such neural network model has two images input:
those images that were mentioned in the last section -in
particular, panchromatic images-, but he also includes, to
2
achieve accuracy in detection, OpenStreetMap maps, that
is, free and editable maps with geographic information that
are distributed under open license. The final input to the
neural network is then the concatenation of both sources
(Figure 5).

Figure 6 – Architecture of U-Net: a multi-channel feature
map

The blue boxes correspond to a multi-channel feature map
and the white boxes are copied feature maps. Like a blue
box represents a multi-channel, the number of channels is
denoted on top in the box and the bottom left edge of the
box provided the dimension. The arrow between two blue
boxes represents the convolution activation function.

Then continuing with Ozaki's architecture, the model in
Figure 5 – The final input combines OpenStreetMap and Figure 7 is an alteration of the U-Net architecture for
pan-sharped multispectral images in the same stack. images segmentation. Basically, each layer represents two
convolutional operations, with a 3x3 kernel, performing a
According to this input, the next step was to decide the nonlinear function. After that, it moves on to the next layer.
layers structure of the neural network. First, it is necessary
to introduce the U-Net neural network architecture. U-Net In the architecture, a progressive subsampling is made until
is a convolutional network for fast and precise a kernel of 3x3@512 is reached (that is, a kernel of 3x3 is
segmentation of images, so that is particularly useful for the applied in the operation and 512 filters are obtained in the
processing of satellite images [13]. output of the convolution). Then, a progressive upsampling
is performed until the data reaches the output layer. This
The architecture of U-Net consists, like any other layer will give an image with the same dimensions of the
convolutional network, in a large number of different input image, with the segmentation made. After all this
operations, illustrated by the model in Figure 6. The ‗input process, for each layer not only will be used as input the
image tile‘ represent the input of the images and then the output of the previous layer after doing an upsampling. The
data is propagating through the network along with all input will also include the output of the layer that presents
possible steps and, in the end, the ready segmentation map the analogous dimensions of kernel and image.
comes out.

2 OpenStreetMap (OSM) is a collaborative project to create
a free editable map of the world

– 143 –

154 155 156 157 158 159 160 161 162 163 164