Page 53 - ITU Journal - ICT Discoveries - Volume 1, No. 2, December 2018 - Second special issue on Data for Good
P. 53
ITU JOURNAL: ICT Discoveries, Vol. 1(2), December 2018
data alone is not the same as knowledge or truth. the platform to create density maps of users with
Ultimately, data cannot replace moral judgement on Facebook interests such as “Jewish prayer” or “gay
which actions to take. bar”. Using interests strongly correlated with
protected attributes such as race can also be used to
Another important challenge is the fact that the data exclude certain parts of the population to view, say,
provided to advertisers comes from proprietary ads related to housing, employment or financial
black boxes without an easy way for academics or services [29]. For our own research we never use
others to audit aspects related to data quality. the “custom audiences” and only perform
Whereas some user attributes such as age and secondary analysis of anonymous and aggregated
gender are most likely derived from self-declared data.
information, other attributes such as Facebook’s
“Ex-pats (Germany)” are based on a proprietary 6. CONCLUSIONS
inference algorithm with an unknown accuracy.
However, as advertising is the main revenue source The case studies above demonstrate the value that
for Facebook and others, one would assume that online advertising audience estimates hold for
they invest appropriate resources in satisfying the complementing existing traditional data sources
advertisers’ needs, i.e. accurately inferring user such as surveys and censuses for improving global
attributes. As mentioned previously, the typical use development statistics. Note that all the data
of the audience estimates is also not as a census, i.e. sources described here are publicly available free of
as an exhaustive count, but as an input signal for a cost, which helps to reduce latency for near
regression task. In this latter setting one might still real-time estimates. Furthermore, this helps to
be able to obtain accurate predictions for variables democratize data access as, traditionally, personal
of interest, even without full transparency on the contacts at big companies would be required for
precise specification of the input variables. accessing similar types of data.
Last but not least, the issue of user privacy is We do not see big data as a silver bullet to overcome
important to consider, in particular in light of the the challenges of significant data gaps. However, we
recent Cambridge Analytica scandal. Other do believe that when used (i) in combination with,
researchers have shown that previous versions of not in replacement of, existing data sources, and (ii)
Facebook’s advertising platform leaked personally in an ethical and responsible manner, then online
identifiable information [28]. This leakage was advertising audience estimates can help to fill data
possible through an exploitation of the audience gaps on important topics such as gender gaps and
estimates for so-called “custom audiences” that international migration. Better data on these topics
involve targeting a particular set of users, identified will hopefully support better policy making and
by name, email or phone number. In its current lead to better resource allocation.
version, audience estimates can only be obtained
for anonymous, aggregate user groups such as male ACKNOWLEDGEMENT
Germans living in Geneva. As such, most of the
privacy concerns are similar to those surrounding Ridhi Kashyap and Ingmar Weber are supported by
population estimates for census tracts. At the same a Data2x “Big Data for Gender Challenge” grant.
time, due to the possibility of (i) dynamically Details at http://data2x.org/big-data-challenge-
targeting different sub-populations, and (ii) doing awards/#digital.
so in a repeated manner, it cannot be ruled out that
despite the aggregation and rounding of the REFERENCES
returned audience estimates a sufficiently skilled
attacker could abuse this data source and obtain [1] Barbara Adams, Karen Judd: The Ups and
attributes for individual users. However, none of the Downs of Tiers: Measuring SDG Progress.
data collected in any of the described or proposed Global Policy Watch, 2018.
work contains individual level information and https://www.globalpolicywatch.org/blog/2
could not be used to obtain such information. 018/04/26/tiers-measuring-sdg-progress/
Apart from privacy concerns for individual users,
there is the harder to address issue of group-level
profiling. For example, one could potentially abuse
© International Telecommunication Union, 2018 31