Page 131 - Kaleidoscope Academic Conference Proceedings 2021

P. 131

Connecting physical and virtual worlds

2.1 Pose estimation 2.2.1 Vector auto-regression

The pose estimation technique forecasts and tracks the A multivariate time series has numerous time-dependent
location of an individual person or object. This is carried out variable [9]. Each variable relies not only on its past values
by looking at a combination of the pose and therefore the but also has some dependency on other variables. This
orientation of a given person. It is typically performed by dependency is employed for forecasting future values.
identifying, locating, and tracking a number of key points on
the person’s skeleton estimated data. Considering each Each variable in a VAR model is a linear function of the past
skeleton with N joints, including head, neck, arms and legs, values of itself and old values of all the other variables. The
every joint position is interpreted in the image coordinate two-time dependent variables are image array data (y 1) and
with coordinate values of x and y, so there is a total of 2N key points (y 2). These variables influence each other. To
points for each skeleton. The Part Affinity Fields (PAFS) are compute y 1(t), the past values of y 1 and y 2 are used. Similarly,
used to associate the joints of an individual [7]. These values for calculating y 2(t), past values of both y 1 and y 2 are used.
are concatenated and used as the skeleton information of the
human subject for each frame processed. ∗ 1 ∗ 1

1 (3)

2.1.1 Centroid method
∗ 1 ∗ 1

A unique Identification (ID) is assigned to each centroid after 1 (4)

computation, then new centroids are computed in the next
frame. The Euclidean distance between the centroids of the where a 1 and a 2 are the constant terms, w i , w i , w i , and
21
12
11
current and former frames are correlated based on the w i for i = 1, 2, are the coefficients, and e 1(.) and e 2(.) are the
22
minimum distance [8]. If the correlation is found, the new error terms.
centroid is updated with the ID of the old centroid of the
same color. If the correlation isn’t found then the new 2.3 Bi-LSTM
centroid is given a unique ID and a different color. If the
person goes out of the frame for a set amount of frames, the In Bidirectional Long Short-Term Memory (Bi-LSTM), the
ID is removed. output at any time is not only dependent on the past frames
within the sequence, but also on the future frames. The two
In a video frame with multiple objects, tracking each LSTM are stacked on top of every other, where one LSTM
individual object requires a technique to distinguish between goes within the forward direction and another in the
different key points. The centroid method is used to backward direction. The combined output is then calculated
differentiate the key points of each individual. The centroids based on the hidden layers of both LSTMs [10]. The
are computed using the formula (1). architecture of a basic unit of LSTM used in Bi-LSTM is
shown in Figure 2.
, ⋯ / ,

⋯ / (1)

th
where x i and y i represents the x and y coordinates of the i x +
key point on joints of an individual.
tanh
2.2 Activity forecasting
Activity forecasting methods have been developed to cope x x
with the unavailable data for the analysis due to network
issues in the communication channel of the surveillance
system. During the streaming of the video, if the videos tanh
pause for a few seconds, then the future pose and motion of
the individual is predicted. It is trained using Vector Auto-
Regression (VAR) with the videos from human activity data
sets. The forecast takes the form as given in Equation (2)
⋯ (2)

x t
where b 0 is the intercept, and b 1, b 2,..., b n are coefficients
which represents the contribution of independent variables Figure 2 – A basic unit of LSTM used in Bi-LSTM
X 1, X 2 ,..., X n.
The general architecture of Bi-LSTM utilized in the
proposed method has the external structure of the training

– 69 –

126 127 128 129 130 131 132 133 134 135 136