Page 42 - FIGI - Big data, machine learning, consumer protection and privacy
P. 42
Ongoing monitoring, improvement and accountabil- crimination, such as gender, race or postal code. This
ity of machine learning systems depends on docu- requires guidance from lawyers regarding the types
menting these objectives. of features that would be an unlawful basis for dis-
215
Risk management may apply to both input and crimination. Constant monitoring through statistical
output data in machine learning models: 216 representation of output data should also improve
On the input data side, risk mitigation will start detection of anomalies, feedback loops and other
with documenting the requirements of the model misbehaviour. Again, documenting these and ongo-
(e.g., data-freshness, features and uses), the degree ing testing will improve and widen understanding of
of dependence on data from surrounding systems, a model’s risks.
why and how personal data is included and how it is Risk assessment extends both to the input and
protected (e.g., encryption or otherwise), as well as output data, and to the creation and operation of
its traceability. Such documentation supports effec- algorithms. The research institute AINow has pro-
217
tive review and maintenance. It will include assessing posed that public agencies carry out “algorithmic
the “completeness, accuracy, consistency, timeliness, impact assessments”, including in procurement of
duplication, validity, availability, and provenance” of data and software, and in the operation of automat-
the input data. Mechanisms to ensure the model may ed decision-making processes, as part of a wider set
be tested, updated and monitored over time may of accountability measures.
218
also be important. Altogether, data processors need to define
On the output data side, various processes may intended outcomes as well as unintended outcomes
be instituted to reduce risk of machine learning that should be avoided (working with legal and com-
models producing adverse results. Bias detection pliance teams), and be ready to correct or pull the
mechanisms can be instilled to ensure that popula- model out of usage. If outputs risk breaching con-
tion groups are not discriminated against, or at least sumer protection, data privacy, antidiscrimination
bias is quantified and minimised. Sometimes it may or other laws, firms should be ready with a strate-
be necessary to restrict certain types of data in the gy for dealing with authorities. For instance, Califor-
model. Output data can also be analyzed to detect nia’s guidance on permits for autonomous vehicles
proxies for features that might be a basis for dis- has specific provisions addressing how a firm should
Monetary Authority of Singapore’s FEAT Principles
4. AIDA-driven decisions are regularly reviewed so that models behave as designed and intended.
5. Use of AIDA is aligned with the firm’s ethical standards, values and codes of conduct.
6. AIDA-driven decisions are held to at least the same ethical standards as human-driven decisions.
Smart Campaign’s draft Digital Credit Standards
Indicator 2�1�3�0
If the repayment capacity analysis is automated (e.g., through the use of an algorithm), the effective-
ness of the system in predicting the client repayment capacity is reviewed by a unit of the organiza-
tion independent from the algorithm development team (e.g. internal audit, senior management, or
other department). The review provides recommendations to improve the algorithm outcomes that
are promptly implemented.
Indicator 2�1�10�0
The provider has a rigorous internal control process to verify the uniform application of policies and
procedures around credit underwriting. This applies both to cases where staff is involved or when the
process is automated.
Indicator 2�1�10�1
The rationale for an algorithm is documented including the factors/types of variables used and justi-
fication for relying on those factors. An independent unit within the organization periodically reviews
alignment and compliance between rationale, the algorithm, and its outputs. There is documented
evidence of tests run and corrective actions taken.
40 Big data, machine learning, consumer protection and privacy