Page 167 - Proceedings of the 2017 ITU Kaleidoscope
P. 167

Challenges for a data-driven society





              −  Reputation DTM (  )                              Table 1. The users × items × features input matrix of the

                                                                                 CF algorithm




                        =       +       +⋯ +               (3)              Trustors  (Users)   Features


                  where    represents the reputation towards data                         T cm  T  T  T vl  T ac  T cn
                                                                                                 tm

                                                                                              uq
                  source B by its previous users n. A mechanism that   Trustees (DS)  u1 u2 …  un u
                  computes reputation based on PageRank algorithm   i1       ◬      ◬      ◈  ◈  ◈   ◈  ◈  ◈
                  is presented in our previous research [2].
                                                                     ⁞          ◬      ◬   ◈  ◈  ◈   ◈  ◈  ◈
           After releasing the  main  DTMs, the next objective is to
           combine them  in order to produce a final data trust value   jn m    ◬   ◬      ◈  ◈  ◈   ◈  ◈  ◈

           (  ) for each data source based on DTAs as below:





                              =       +        +         (4)  The basic but essential requirement of the predicted trust
                                                              value is that it must provide closest possible prediction for
           where ρ,τ, and ω are weighting factors based on the trustors   each trust value that is already calculated by each user. With
           preference on each TM. In here, we suggest two mechanisms   this assumption,  we can  use  mean square error (MSE)
           to combine each TM either based on the ML approach we   method to find the distance between actual trust values and
           followed in [7]  or applying the rule based reasoning   predicted one. The parameter θ  which gives minimum error
                                                                                       (j)
           mechanism explained in [4].
                                                              would be our best predicted trust value. This idea is
           4.3. Data Trust Prediction                         formulated as below for trustor j:

           Once the trust values based on DTA are collected, next step   min ∑            −   ( , )   + ∑ 6  (θ )    (6)


                                                                                               λ
                                                                                                      (j) 2
                                                                                   ( )
                                                                             ( )
           is to find the trust relationship among data sources and the   ( )     : ( , )     2  k=1  k
           trustors who do not have prior encounters. For that, we use   In the first part of the equation, the mean error is calculated
           the concepts of  well-known collaborative filtering (CF)   over all the records where the trust value is already available
           technique to predict the unknown trust values between the   through preliminary calculation. The second part of the
           user and specific data source with respect to six different data   equation is used to regularize the minimization process and
           centric features (e.g., completeness, uniqueness, timeliness,   there-by avoiding the overfitting issues. The k denotes the
           validity, accuracy and consistency). As now the predication   number of features. Similar  manner,  we can  find the best
           is solely based on properties of data, it is unnecessary to rely   parameter for each trustor as below:
           on trustworthiness of the data source as in traditional
           methods  anymore.   Among   various  methods  of              ( )  min  (    )  J(   ( ) ,  ( ) ,…,   (    ) )  (7)
                                                                            ( )
           recommendation techniques,  we particularly choose a             ,   ,…,
           variant of a multifaceted CF model for our application due to    where J(.) denotes the cost function as described in equation
           its unique properties that match with our data trust model like   (6).  In order to minimize the cost function, we simply adapt
                                                                                                            (j)
           stressing the concept of social contribution where everyone’s   the gradient decent method and solve for best parameter θ k
           contribution matters, capacity to capture weak signals in the   as below [34]:
           overall data,  ability to detect strong relationships between


           close items and competence to avoid overfitting  [33].       ( )  −  ∑  : ( , )              −  ( , )      ,   = 0   (8)
                                                                                                 ( )
                                                                                  ( )
                                                                                       ( )
                                                                  ( )  =
           First,  we define the inputs to our algorithm as number of     ( )  − (∑  : ( , )              −  ( , )     ( )  +    ( ) ) ,   ≠ 0

                                                                                       ( )
                                                                                  ( )
           trustors or users (nu), number of Trustees or DSs (nm) and six
                                                                               (j)
           features as shown in Table 1. Users who already have trust   Once the parameter θ is estimated through equation (7) and
           relationship  with DSs are noted  with “◬” symbol  which   (8), predicted trust value between user j and item i will be
           actually represents some trust  value between [0,1],   given by the equation (5). Please note that this process is an
           calculated using equation (4) and the blank spaces denote the   iterative process and that more users who have experience
           missing information, which is to be predicted. Formally, if   with similar DSs would make the system more accurate and
           user j and item i already have trust relationship, then r(i,j)=1   trustworthy.
           and r(i,j)=0, otherwise. Moreover, the data trust value given   5. IMPLEMENTATION MODEL
           by user j to DS i is denoted by y (i,j).  The symbol “◈” represents
           the values of each six features in between 0 and 1.    In this section,  we propose a possible implementation
                                                              scenario of our findings based on air pollution crowd sensing
           The next step of our algorithm is to find a parameter that   use case, aimed at collecting and monitoring pollution data.
           describes the profile of users involved in a certain situation.   The air pollution sensing requires active citizen participation
           For now let’s assume this parameter is denoted by  θ for a   by carrying wearable sensors as they traverse the city based
                                                      (j)
                                                         (i)
           particular user j and feature vector for DS i is denoted by T .   on opportunistic crowd sensing application [35]. However,
           Then the predicted data trust value T dp ij between the trustor   monitoring such air pollution via crowd sensing requires that
           and the data can be calculated as in equation (5). The symbol   the data being provided are trustworthy and can be relied
           (.) represent the transpose of the vector.         upon by city authority or government to make an immediate
             T

                                         ( )
                                   ( )
                                    =     (  )           (5)  decision. The air pollution crowd sensing application  will
                                                              take advantage of citizen’s smartphones and smart city’s air
                                                          – 151 –
   162   163   164   165   166   167   168   169   170   171   172