Page 53 - ITU Journal - ICT Discoveries - Volume 1, No. 2, December 2018 - Second special issue on Data for Good
P. 53

ITU JOURNAL: ICT Discoveries, Vol. 1(2), December 2018




          data alone is not the same as knowledge or truth.    the platform to create density maps of users with
          Ultimately, data cannot replace moral judgement on   Facebook interests such as “Jewish prayer” or “gay
          which actions to take.                               bar”.  Using  interests  strongly  correlated  with
                                                               protected attributes such as race can also be used to
          Another important challenge is the fact that the data   exclude certain parts of the population to view, say,
          provided  to  advertisers  comes  from  proprietary   ads  related  to  housing,  employment  or  financial
          black boxes without an easy way for academics or     services [29]. For our own research we never use
          others  to  audit  aspects  related  to  data  quality.   the  “custom  audiences”  and  only  perform
          Whereas  some  user  attributes  such  as  age  and   secondary analysis of anonymous and aggregated
          gender are most likely derived from self-declared    data.
          information,  other  attributes  such  as  Facebook’s
          “Ex-pats  (Germany)”  are  based  on  a  proprietary   6.  CONCLUSIONS
          inference  algorithm  with  an  unknown  accuracy.
          However, as advertising is the main revenue source   The case studies above demonstrate the value that
          for  Facebook and others, one would assume  that     online  advertising  audience  estimates  hold  for
          they invest appropriate resources in satisfying the   complementing  existing  traditional  data  sources
          advertisers’  needs,  i.e.  accurately  inferring  user   such as surveys and censuses for improving global
          attributes. As mentioned previously, the typical use   development  statistics.  Note  that  all  the  data
          of the audience estimates is also not as a census, i.e.   sources described here are publicly available free of
          as an exhaustive count, but as an input signal for a   cost,  which  helps  to  reduce  latency  for  near
          regression task. In this latter setting one might still   real-time  estimates.  Furthermore,  this  helps  to
          be able to obtain accurate predictions for variables   democratize data access as, traditionally, personal
          of  interest, even without full transparency on the   contacts  at  big  companies  would  be  required  for
          precise specification of the input variables.        accessing similar types of data.

          Last  but  not  least,  the  issue  of  user  privacy  is   We do not see big data as a silver bullet to overcome
          important to consider, in particular in light of the   the challenges of significant data gaps. However, we
          recent  Cambridge  Analytica  scandal.  Other        do believe that when used (i) in combination with,
          researchers have shown that previous versions of     not in replacement of, existing data sources, and (ii)
          Facebook’s advertising platform leaked personally    in an ethical and responsible manner, then online
          identifiable  information  [28].  This  leakage  was   advertising audience estimates can help to fill data
          possible  through  an  exploitation  of  the  audience   gaps on important topics such as gender gaps and
          estimates  for  so-called  “custom  audiences”  that   international migration. Better data on these topics
          involve targeting a particular set of users, identified   will  hopefully  support  better  policy  making  and
          by  name,  email  or  phone  number.  In  its  current   lead to better resource allocation.
          version,  audience  estimates  can  only  be  obtained
          for anonymous, aggregate user groups such as male    ACKNOWLEDGEMENT
          Germans  living  in  Geneva.  As  such,  most  of  the
          privacy concerns are similar to those surrounding    Ridhi Kashyap and Ingmar Weber are supported by
          population estimates for census tracts. At the same   a  Data2x  “Big  Data  for  Gender  Challenge”  grant.
          time,  due  to  the  possibility  of  (i)  dynamically   Details  at  http://data2x.org/big-data-challenge-
          targeting different sub-populations, and (ii) doing   awards/#digital.
          so in a repeated manner, it cannot be ruled out that
          despite  the  aggregation  and  rounding  of  the    REFERENCES
          returned  audience  estimates  a  sufficiently  skilled
          attacker  could  abuse  this  data  source  and  obtain   [1]   Barbara  Adams,  Karen  Judd:  The  Ups  and
          attributes for individual users. However, none of the      Downs  of  Tiers:  Measuring  SDG  Progress.
          data collected in any of the described or proposed         Global      Policy      Watch,       2018.
          work  contains  individual  level  information  and        https://www.globalpolicywatch.org/blog/2
          could not be used to obtain such information.              018/04/26/tiers-measuring-sdg-progress/

          Apart  from  privacy  concerns  for individual  users,
          there is the harder to address issue of group-level
          profiling. For example, one could potentially abuse




                                             © International Telecommunication Union, 2018                    31
   48   49   50   51   52   53   54   55   56   57   58