Page 28 - FIGI - Big data, machine learning, consumer protection and privacy
P. 28

“any information relating to an identified or identi-  Where machine learning algorithms are trained on
            fiable natural person,”  “special categories” of per-  input data that is based on historical examples, they
                                120
            sonal data are more specific. They relate to “racial   may result in disadvantages for certain historically
            or ethnic origin, political opinions, religious or phil-  disadvantaged population groups. They may there-
            osophical  beliefs,  or  trade  union  membership,  and   fore reflect past discrimination regardless of the rea-
            the processing of genetic data, biometric data for   sons that arose in the past (e.g., due to prejudice or
            the purpose of uniquely identifying a natural person,   implicit bias). Where such previous decisions were
            data concerning health or data concerning a natural   themselves biased, the training data for machine
            person’s sex life or sexual orientation.”          learning processes may perpetuate or exacerbate
                                              121
                                                               further bias.
            Limiting processing of special categories of data    An individual’s creditworthiness may be evaluat-
            Automated decision-making based on special cate-   ed based not only on their attributes, but those of
            gories of personal data is only permitted under    their social network. In  2015, Facebook secured a
            the GDPR with explicit consent from the user or if   patent that, among other things, enables filtering of
            “necessary for reasons of substantial public interest,   loan applications depending on whether the average
            on the basis of Union or Member State law which    credit rating of a loan applicant’s friends exceeds a
            shall be proportionate to the aim pursued, respect   prescribed minimum credit score.  This may risk
                                                                                              128
            the essence of the right to data protection and    discrimination,  and  even  financial exclusion,  if  an
            provide for suitable and specific measures to safe-  applicant’s friends are predominantly members of a
            guard the fundamental rights and the interests of the   low  income  population  even  if  the  applicant’s  own
            data subject.” 122                                 features should otherwise qualify him or her for the
               The purpose of such tighter restrictions on dealing   loan.  The risk is that, by relying on past data, such
                                                                   129
            with special categories is to provide practical means   technologies will facilitate wealthier populations’
            of reinforcing other laws prohibiting discrimination   access to financial services and impede access for
            on the basis of such data, whether in the provision of   minority groups that lacked access in the past, there-
            public or private services or otherwise. The right to   by “automating inequality.” 130
            privacy seeks to prevent disclosures that may lead to   Discrimination  may  also  be  built  into  machine
            discrimination and other irreversible harms. 123   learning models in “feature selection,” i.e., the choic-
               In the era of big data, however, non-sensitive data   es in their construction regarding which data should
            can be used to infer sensitive data. For example, a   be considered. While a model might not  explicit-
            name may be used to infer religion or place of birth   ly consider membership of a protected class (e.g.,
            which in turn can be used to infer race and other   gender, race, religion, ethnicity), particularly if doing
            personal data that belong to the special categories.   so would be unlawful, it might nevertheless rely on
            Shopping data can reveal purchase history of medi-  inputs that are effectively proxies for membership of
            cine from which a health condition may be inferred,   such a protected class. Postcodes are a commonly
            affecting decisions such as a person’s eligibility for   cited example, as some areas have a high percent-
            health insurance.  Demographic and statistical data   age  of the  population  from  a  particular  ethnic  or
                           124
            relating to wider groups may also be attributed to   racial group.
            specific individuals. As a result, non-sensitive data   Another concern arises when the machine learn-
            may merit the same protections as sensitive data.    ing model fails to consider a wide enough set of fac-
                                                         125
            The result is that the distinction between sensitive   tors to ensure that members of a protected group
            and non-sensitive data becomes blurred and of      are assessed just as accurately as others. A mod-
            questionable utility.                              el may have less credit data on members of a less
                             126
               This is not a light matter of definitional strain. One   advantaged group because fewer members of such
            of the basic objectives of data protection and pri-  group have borrowed in the past. If algorithms are
            vacy law and regulation is to ensure that data is not   trained using more input data from one particular
            used to result in discrimination, particularly of pro-  group than another, they may produce outputs dis-
            tected groups that have been the subject of historic   proportionately inclined towards the former group.
            discrimination. The nature of big data and machine   Additionally, machine learning models could
            learning undermines this objective. As several schol-  potentially be used to mask discrimination intention-
            ars put it recently, “A significant concern about auto-  ally. This could arise if the training data is intention-
            mated  decision  making  is  that  it  could  potentially   ally distorted or if proxies for a protected class are
            systematize and conceal discrimination.”
                                                127


           26    Big data, machine learning, consumer protection and privacy
   23   24   25   26   27   28   29   30   31   32   33