Fit for Whom?
Women at the Table/FemTechnology
Session 184
Sex-Stratified Data and the Integrity of High-Risk AI
The Call
High-risk AI systems are being deployed on women who were never in the data. This session calls for sex-stratified data and disaggregated performance reporting to be established as minimum requirements — not best practices — for any AI system deployed in healthcare or criminal justice. With the inaugural Global Dialogue on AI Governance convening alongside WSIS Forum 2026 in Geneva this July, the window to embed these requirements at the multilateral standard-setting level is now.
Background
AI systems in healthcare and criminal justice are producing systematically worse outcomes for women — not because of model-level failures, but because the data on which they train was built by institutions that historically excluded women. Clinical trials defaulted to male physiology until 1993. Diagnostic guidelines were calibrated on male bodies. Judicial records were shaped by decades of gendered credibility assessments. When AI trains on this foundation, it does not correct these exclusions. It scales them.
The consequences are quantified and documented. AI systems trained on misrepresentative health data reduce diagnostic accuracy by 11.3 percentage points. Cardiac disease prediction algorithms are consistently less accurate for women even when trained on sex-balanced datasets. Judicial risk assessment tools systematically overpredict women's recidivism — women rated "high risk" reoffend at less than half the rate of men with the same score. State-of-the-art language models produce different clinical assessments from identical case notes depending on whether the patient is labelled male or female. These are not edge cases. They are the predictable output of structurally incomplete data — and aggregate accuracy metrics hide them.
This is a data integrity crisis, not only an equity concern. Systems that fail for half the population are not high-performing systems. They are systems whose failures are concealed by the metrics we use to evaluate them.
Relevant Projects
Two bodies of work anchor this session. Invisible by Design: Women's Health as the Blind Spot in AI and Medicine (Women At The Table & FemTechnology, 2025) traces the full six-layer cascade — from male-default clinical research through biased EHR documentation to distorted AI outputs — and documents how fairness patches applied at the model level cannot repair a data infrastructure problem. Gender Bias in Judicial Algorithms: A Global Analysis of Algorithmic Discrimination (Women At The Table, CSW70 Expert Paper, 2026) shows that the same structural pattern — missing upstream data, biased outputs, and aggregate metrics that conceal subgroup failure — operates identically in criminal justice AI. Together, these papers establish a cross-domain evidence base that reframes the gender data gap as a technical governance challenge, not a diversity aspiration.
Vision for WSIS Beyond 2025, Towards 2035
The WSIS Forum 2026 arrives at a pivotal moment: the inaugural Global Dialogue on AI Governance, co-located in Geneva in July 2026, is setting multilateral AI standards for the first time. If sex-stratified data requirements are not embedded at this stage, they will need to be retrofitted — at greater cost, and after further harm.
This session is designed to produce a concrete output in time for that process: a framework statement on minimum sex-stratified data and disaggregated performance reporting requirements for high-risk AI systems. The vision is not another declaration of principle. It is a submittable civil society document — ready to enter the Global Dialogue process, EU AI Act implementation consultations, and UN treaty body mechanisms including CEDAW General Recommendations 40 & 41 — that translates the WSIS commitment to people-centred, inclusive, and development-oriented digital infrastructure into an enforceable technical standard. What WSIS established as a vision, this session aims to advance as a requirement.
-
C1. The role of governments and all stakeholders in the promotion of ICTs for development
-
C4. Capacity building
-
C6. Enabling environment
-
C7. ICT applications: benefits in all aspects of life — E-government
-
C7. ICT applications: benefits in all aspects of life — E-health
-
C7. ICT applications: benefits in all aspects of life — E-employment
-
C10. Ethical dimensions of the Information Society
-
C11. International and regional cooperation
C1 — The Role of Governments and All Stakeholders Governments are among the largest procurers of AI systems used in healthcare, criminal justice, and social services — the exact domains where the gender data gap produces the most consequential failures. This is true in high-income countries, and equally true across the Global South, where public sector AI deployment is accelerating and procurement frameworks are still being established. This session addresses the responsibility of all governments to require evidence of sex-stratified validation before deploying high-risk AI in public sector applications. The framework statement produced will identify concrete procurement and regulatory levers through which governments — including those with limited regulatory capacity — can make disaggregated performance reporting a condition of deployment, not an optional disclosure.
C4 — Capacity Building Requiring sex-stratified data is only enforceable if regulators, procurement officers, and civil society know what to ask for and how to interpret what they receive. This capacity gap is acute in the Global South, where AI governance frameworks are newer and technical expertise in algorithmic audit is less established — yet where the consequences of deploying unvalidated systems are most severe. This session contributes directly to that capacity — naming the specific metrics, audit methodologies, and reporting formats that make disaggregated performance reporting meaningful in practice. The session's outputs are designed to be immediately usable by the actors who need to operationalise these requirements across diverse regulatory contexts.
C6 — Enabling Environment Sex-stratified data and performance reporting cannot be required of AI systems if the underlying data infrastructure does not support them. Health data standards (HL7/FHIR, ICD, SNOMED) and judicial data architectures must include structured fields for the biological variables — menstrual cycle, pregnancy, menopause — that alter the clinical significance of every data point they contain. In much of the Global South, health data infrastructure is still being built: this is the moment to ensure it is built right, with sex-stratified fields embedded from the start rather than retrofitted. This session treats data infrastructure standards as a regulatory baseline, not a technical afterthought, and engages the standards bodies whose decisions determine what fields exist in the systems AI trains on globally.
C7 — E-health The evidence base is clearest and most developed in healthcare AI. AI systems trained on male-default clinical data reduce diagnostic accuracy for women by 11.3 percentage points. Cardiac disease prediction algorithms underperform for women even on sex-balanced datasets. The missing variables are not obscure — they are the life-stage fields that electronic health records routinely omit. For women in the Global South, where diagnostic infrastructure is thinner and AI clinical tools are often imported without local validation, these failures are compounded: systems that perform poorly for women in the contexts where they were developed perform worse still when deployed across different populations without sex-disaggregated audit. This session addresses what sex-stratified data requirements need to look like in clinical AI globally, and what validation against women's data must mean before a system is considered fit for deployment.
C7 — E-government Judicial risk assessment tools, welfare eligibility algorithms, and social care allocation systems are government-operated AI applications with direct consequences for women's lives. The systematic overprediction of women's recidivism and the gender-differentiated allocation of social care services documented in recent research reflect the same structural failure as healthcare AI: systems trained on historically biased institutional records, deployed without sex-disaggregated validation. Across the Global South, these tools are increasingly in use in judicial and welfare systems with limited oversight mechanisms and significant barriers to legal challenge — making pre-deployment validation requirements all the more essential. This session addresses the governance requirements for AI in public administration as directly as for clinical settings.
C7 — E-employment Algorithmic hiring, performance management, and employment screening tools apply AI to labour market decisions at scale. Where training data reflects historically gendered employment patterns, these systems risk encoding and amplifying existing inequities. In Global South contexts, where informal employment and labour market exclusion already disproportionately affect women, the deployment of employment AI without sex-disaggregated validation adds a further layer of algorithmic disadvantage. This session's cross-domain framing explicitly includes employment AI, and the framework statement's minimum requirements for sex-stratified validation are designed to apply across high-risk application domains and across income contexts.
C10 — Ethical Dimensions of the Information Society. Aggregate accuracy metrics are not a sufficient ethical standard for high-risk AI. A system that performs well on average while systematically failing women — and failing them most severely in Global South contexts where redress mechanisms are weakest — is not a high-performing system. It is a system whose failures are hidden by the metrics used to evaluate it. This session reframes disaggregated reporting by sex and intersecting characteristics as the minimum expression of ethical AI commitment, moving the conversation from principles to enforceable requirements. The framework statement is designed to give the ethical dimension of the WSIS Forum 2026 a technical and operational form that travels beyond Geneva.
C11 — International and Regional Cooperation The gender data gap in AI is a global problem with a multilateral solution pathway. CEDAW obligations, the Global Digital Compact, and the inaugural Global Dialogue on AI Governance all provide existing frameworks through which minimum sex-stratified data requirements can be advanced internationally. Women in the Global South face the compounded effects of data scarcity, weaker regulatory environments, and deployment of systems validated on populations that do not represent them — making international cooperation on minimum standards not only a governance question but an equity imperative. This session's output is explicitly positioned for submission to the Global Dialogue process, and its design centres Global South voices as co-authors of the requirements, not recipients of standards developed elsewhere. The WSIS Forum 2026 is the right moment, and Geneva the right place, to ensure that the first generation of multilateral AI governance standards is built for all women — not only those whose data was collected.
-
Goal 3: Ensure healthy lives and promote well-being for all
-
Goal 5: Achieve gender equality and empower all women and girls
-
Goal 8: Promote inclusive and sustainable economic growth, employment and decent work for all
-
Goal 10: Reduce inequality within and among countries
-
Goal 16: Promote just, peaceful and inclusive societies
-
Goal 17: Revitalize the global partnership for sustainable development
SDG 3 — Good Health and Well-Being Healthcare AI is the session's primary evidence domain, and the failures documented are health outcome failures: reduced diagnostic accuracy, missed cardiac events, delayed treatment, and clinical assessments that differ based on the patient's sex label rather than their clinical presentation. For women in the Global South, where health system capacity is more constrained and AI clinical tools are often imported without local validation, these failures translate directly into avoidable illness, delayed care, and preventable death. Establishing minimum sex-stratified data and validation requirements for clinical AI is a prerequisite for AI contributing to SDG 3 rather than undermining it. This session advances that requirement as a concrete governance standard.
SDG 5 — Gender Equality and the Empowerment of All Women and Girls Gender equality is the foundational commitment running through every element of this session. AI systems that produce systematically worse outcomes for women in healthcare, criminal justice, and employment are not neutral tools — they are infrastructure that encodes and scales historical exclusion. Achieving SDG 5 in a world where high-risk decisions are increasingly automated requires that the data underpinning those decisions accurately represents women, that systems are validated against women's outcomes before deployment, and that women — including women in the Global South — are present as co-designers of the standards governing these systems, not only as subjects of their outputs. The framework statement this session produces is a direct contribution to the technical operationalisation of SDG 5.
SDG 8 — Decent Work and Economic Growth The consequences of gender-biased AI extend beyond health and justice into economic life. Employment screening and hiring algorithms trained on historically gendered labour market data risk systematically disadvantaging women at the point of access to decent work. Misdiagnosis and delayed treatment — predictable outputs of healthcare AI trained on male-default data — generate economic costs borne disproportionately by women: lost income, reduced workforce participation, and care burdens that fall on women when health systems fail to identify and treat their conditions accurately. In Global South contexts, where women's economic participation is already constrained by structural barriers, these algorithmic failures compound existing disadvantage. Sex-stratified validation requirements for employment and healthcare AI are therefore an SDG 8 issue as much as an SDG 5 issue.
SDG 10 — Reduced Inequalities The gender data gap in AI is a mechanism of inequality reproduction. When AI systems trained on historically biased data are deployed at scale — in clinical settings, courtrooms, welfare offices, and hiring processes — they do not merely reflect existing inequalities; they institutionalise them in algorithmic form and apply them at a speed and scale no human bureaucracy could match. The aggregate metrics used to evaluate these systems conceal their differential impact, allowing systematic failure for women, and particularly for women at the intersection of gender, race, and geography, to remain invisible in official performance records. Disaggregated reporting by sex and intersecting characteristics is the minimum technical requirement for making inequality visible — and visibility is the prerequisite for accountability. This session advances that requirement as a governance standard applicable across high-income and Global South contexts alike.
SDG 16 — Peace, Justice and Strong Institutions Judicial risk assessment tools that systematically overpredict women's recidivism, and social care algorithms that allocate fewer resources to women with identical needs to men, are failures of the institutions charged with delivering justice and protection. Strong institutions in the age of AI require that automated decision-making systems used in judicial and public administration contexts are validated against the outcomes of all the populations they govern — not optimised for aggregate performance metrics that hide subgroup failure. In Global South jurisdictions where legal challenge to algorithmic decisions is harder and institutional accountability mechanisms are weaker, the deployment of unvalidated justice AI poses a particularly acute threat to SDG 16. Pre-deployment sex-stratified validation requirements are a foundation of just and accountable AI governance, and this session advances them as such.
SDG 17 — Partnerships for the Goals No single government, standards body, or civil society organisation can close the gender data gap in AI alone. The problem is structural, cross-domain, and global — and the solution requires the kind of sustained multilateral partnership that SDG 17 calls for. This session is designed to build exactly that: bringing together data standards bodies, AI developers, regulators, treaty body members, and civil society from both the Global North and Global South to produce a shared framework statement that can travel into multiple governance processes simultaneously. The co-location with the inaugural Global Dialogue on AI Governance makes the WSIS Forum 2026 a unique entry point for ensuring that the first generation of multilateral AI standards reflects a genuine global partnership — one in which the communities bearing the greatest costs of the gender data gap are present as architects of the solution.
- Objective 1: Close all digital divides and accelerate progress across the Sustainable Development Goals
- Objective 2: Expand inclusion in and benefits from the digital economy for all
- Objective 3: Foster an inclusive, open, safe and secure digital space that respects, protects and promotes human rights
- Objective 4: Advance responsible, equitable and interoperable data governance approaches
- Objective 5: Enhance international governance of artificial intelligence for the benefit of humanity