Page 167 - Proceedings of the 2017 ITU Kaleidoscope

P. 167

Challenges for a data-driven society

− Reputation DTM ( ) Table 1. The users × items × features input matrix of the

CF algorithm

= + +⋯ + (3) Trustors (Users) Features

where represents the reputation towards data T cm T T T vl T ac T cn
tm

uq
source B by its previous users n. A mechanism that Trustees (DS) u1 u2 … un u
computes reputation based on PageRank algorithm i1 ◬ ◬ ◈ ◈ ◈ ◈ ◈ ◈
is presented in our previous research [2].
⁞ ◬ ◬ ◈ ◈ ◈ ◈ ◈ ◈
After releasing the main DTMs, the next objective is to
combine them in order to produce a final data trust value jn m ◬ ◬ ◈ ◈ ◈ ◈ ◈ ◈

( ) for each data source based on DTAs as below:

= + + (4) The basic but essential requirement of the predicted trust
value is that it must provide closest possible prediction for
where ρ,τ, and ω are weighting factors based on the trustors each trust value that is already calculated by each user. With
preference on each TM. In here, we suggest two mechanisms this assumption, we can use mean square error (MSE)
to combine each TM either based on the ML approach we method to find the distance between actual trust values and
followed in [7] or applying the rule based reasoning predicted one. The parameter θ which gives minimum error
(j)
mechanism explained in [4].
would be our best predicted trust value. This idea is
4.3. Data Trust Prediction formulated as below for trustor j:

Once the trust values based on DTA are collected, next step min ∑ − ( , ) + ∑ 6 (θ ) (6)

λ
(j) 2
( )
( )
is to find the trust relationship among data sources and the ( ) : ( , ) 2 k=1 k
trustors who do not have prior encounters. For that, we use In the first part of the equation, the mean error is calculated
the concepts of well-known collaborative filtering (CF) over all the records where the trust value is already available
technique to predict the unknown trust values between the through preliminary calculation. The second part of the
user and specific data source with respect to six different data equation is used to regularize the minimization process and
centric features (e.g., completeness, uniqueness, timeliness, there-by avoiding the overfitting issues. The k denotes the
validity, accuracy and consistency). As now the predication number of features. Similar manner, we can find the best
is solely based on properties of data, it is unnecessary to rely parameter for each trustor as below:
on trustworthiness of the data source as in traditional
methods anymore. Among various methods of ( ) min ( ) J( ( ) , ( ) ,…, ( ) ) (7)
( )
recommendation techniques, we particularly choose a , ,…,
variant of a multifaceted CF model for our application due to where J(.) denotes the cost function as described in equation
its unique properties that match with our data trust model like (6). In order to minimize the cost function, we simply adapt
(j)
stressing the concept of social contribution where everyone’s the gradient decent method and solve for best parameter θ k
contribution matters, capacity to capture weak signals in the as below [34]:
overall data, ability to detect strong relationships between

close items and competence to avoid overfitting [33]. ( ) − ∑ : ( , ) − ( , ) , = 0 (8)
( )
( )
( )
( ) =
First, we define the inputs to our algorithm as number of ( ) − (∑ : ( , ) − ( , ) ( ) + ( ) ) , ≠ 0

( )
( )
trustors or users (nu), number of Trustees or DSs (nm) and six
(j)
features as shown in Table 1. Users who already have trust Once the parameter θ is estimated through equation (7) and
relationship with DSs are noted with “◬” symbol which (8), predicted trust value between user j and item i will be
actually represents some trust value between [0,1], given by the equation (5). Please note that this process is an
calculated using equation (4) and the blank spaces denote the iterative process and that more users who have experience
missing information, which is to be predicted. Formally, if with similar DSs would make the system more accurate and
user j and item i already have trust relationship, then r(i,j)=1 trustworthy.
and r(i,j)=0, otherwise. Moreover, the data trust value given 5. IMPLEMENTATION MODEL
by user j to DS i is denoted by y (i,j). The symbol “◈” represents
the values of each six features in between 0 and 1. In this section, we propose a possible implementation
scenario of our findings based on air pollution crowd sensing
The next step of our algorithm is to find a parameter that use case, aimed at collecting and monitoring pollution data.
describes the profile of users involved in a certain situation. The air pollution sensing requires active citizen participation
For now let’s assume this parameter is denoted by θ for a by carrying wearable sensors as they traverse the city based
(j)
(i)
particular user j and feature vector for DS i is denoted by T . on opportunistic crowd sensing application [35]. However,
Then the predicted data trust value T dp ij between the trustor monitoring such air pollution via crowd sensing requires that
and the data can be calculated as in equation (5). The symbol the data being provided are trustworthy and can be relied
(.) represent the transpose of the vector. upon by city authority or government to make an immediate
T

( )
( )
= ( ) (5) decision. The air pollution crowd sensing application will
take advantage of citizen’s smartphones and smart city’s air
– 151 –

162 163 164 165 166 167 168 169 170 171 172