Page 28 - FIGI - Big data, machine learning, consumer protection and privacy
P. 28
“any information relating to an identified or identi- Where machine learning algorithms are trained on
fiable natural person,” “special categories” of per- input data that is based on historical examples, they
120
sonal data are more specific. They relate to “racial may result in disadvantages for certain historically
or ethnic origin, political opinions, religious or phil- disadvantaged population groups. They may there-
osophical beliefs, or trade union membership, and fore reflect past discrimination regardless of the rea-
the processing of genetic data, biometric data for sons that arose in the past (e.g., due to prejudice or
the purpose of uniquely identifying a natural person, implicit bias). Where such previous decisions were
data concerning health or data concerning a natural themselves biased, the training data for machine
person’s sex life or sexual orientation.” learning processes may perpetuate or exacerbate
121
further bias.
Limiting processing of special categories of data An individual’s creditworthiness may be evaluat-
Automated decision-making based on special cate- ed based not only on their attributes, but those of
gories of personal data is only permitted under their social network. In 2015, Facebook secured a
the GDPR with explicit consent from the user or if patent that, among other things, enables filtering of
“necessary for reasons of substantial public interest, loan applications depending on whether the average
on the basis of Union or Member State law which credit rating of a loan applicant’s friends exceeds a
shall be proportionate to the aim pursued, respect prescribed minimum credit score. This may risk
128
the essence of the right to data protection and discrimination, and even financial exclusion, if an
provide for suitable and specific measures to safe- applicant’s friends are predominantly members of a
guard the fundamental rights and the interests of the low income population even if the applicant’s own
data subject.” 122 features should otherwise qualify him or her for the
The purpose of such tighter restrictions on dealing loan. The risk is that, by relying on past data, such
129
with special categories is to provide practical means technologies will facilitate wealthier populations’
of reinforcing other laws prohibiting discrimination access to financial services and impede access for
on the basis of such data, whether in the provision of minority groups that lacked access in the past, there-
public or private services or otherwise. The right to by “automating inequality.” 130
privacy seeks to prevent disclosures that may lead to Discrimination may also be built into machine
discrimination and other irreversible harms. 123 learning models in “feature selection,” i.e., the choic-
In the era of big data, however, non-sensitive data es in their construction regarding which data should
can be used to infer sensitive data. For example, a be considered. While a model might not explicit-
name may be used to infer religion or place of birth ly consider membership of a protected class (e.g.,
which in turn can be used to infer race and other gender, race, religion, ethnicity), particularly if doing
personal data that belong to the special categories. so would be unlawful, it might nevertheless rely on
Shopping data can reveal purchase history of medi- inputs that are effectively proxies for membership of
cine from which a health condition may be inferred, such a protected class. Postcodes are a commonly
affecting decisions such as a person’s eligibility for cited example, as some areas have a high percent-
health insurance. Demographic and statistical data age of the population from a particular ethnic or
124
relating to wider groups may also be attributed to racial group.
specific individuals. As a result, non-sensitive data Another concern arises when the machine learn-
may merit the same protections as sensitive data. ing model fails to consider a wide enough set of fac-
125
The result is that the distinction between sensitive tors to ensure that members of a protected group
and non-sensitive data becomes blurred and of are assessed just as accurately as others. A mod-
questionable utility. el may have less credit data on members of a less
126
This is not a light matter of definitional strain. One advantaged group because fewer members of such
of the basic objectives of data protection and pri- group have borrowed in the past. If algorithms are
vacy law and regulation is to ensure that data is not trained using more input data from one particular
used to result in discrimination, particularly of pro- group than another, they may produce outputs dis-
tected groups that have been the subject of historic proportionately inclined towards the former group.
discrimination. The nature of big data and machine Additionally, machine learning models could
learning undermines this objective. As several schol- potentially be used to mask discrimination intention-
ars put it recently, “A significant concern about auto- ally. This could arise if the training data is intention-
mated decision making is that it could potentially ally distorted or if proxies for a protected class are
systematize and conceal discrimination.”
127
26 Big data, machine learning, consumer protection and privacy