Not in the data set, not in the world.

How data gaps can distort research and artificial intelligence - and why we should consider intersectionality.

Data is considered the new gold. In science, business and politics, it shapes our decisions, insights and visions of the future. But what happens when data is missing? When certain groups are not covered, when gaps arise that are overlooked in analyses? Data gaps are by no means just a quantitative problem. They have far-reaching social, political and ethical consequences. In research as well as in artificial intelligence (AI), these gaps can lead to entire realities of life being overlooked, discrimination being reinforced and decisions being distorted.

When data is missing

Data gaps do not occur by chance. They are often the result of historical exclusions, methodological decisions or economic prioritization. In medical research, for example, research was mainly carried out on male test subjects for decades, with the result that symptoms in women (e.g. heart attacks) were insufficiently understood for a long time. There are also blind spots in the field of social research: Population groups such as homeless people or people with cognitive impairments are not included in many studies. In practice, this lack of data can lead to distorted results in research and statistics, which can encourage undesirable developments in AI systems and make evidence-based decisions more difficult.

Artificial intelligence

Data gaps are particularly problematic in the field of AI, as machines do not "know" what they do not know. Algorithms learn on the basis of training data and if this is distorted or incomplete, they make biased decisions.

For example, facial recognition software has significantly higher error rates for people with darker skin tones, especially women. A study by Buolamwini & Gebru (2018 ) showed that the error rate for darker skin tones was up to 34.7 % compared to less than 1 % for white men. This is due to a data set that does not sufficiently represent certain groups. The bias was therefore in the data, not in the code.

Similar biases can be seen in AI systems for granting loans, in speech recognition or in automated application processes. As a result, technology sometimes reproduces or reinforces structural inequality.

Intersectional data gaps: The double invisibility

Data gaps become even more problematic when it comes to intersectional perspectives. The term "intersectionality" describes the interactions between different social categories such as gender, ethnicity or disability. Those who only analyse one category often overlook the specific experiences of discrimination of people who are marginalized at several intersections, for example queer migrant women or elderly people with disabilities.

In research terms, this means that

Studies on discrimination often only capture one axis (e.g. gender), but not its interplay with other characteristics.
Data collected does not reveal multiple affiliations, as variables are often treated independently of each other.
In many surveys, questions on specific characteristics are completely missing. The groups concerned simply do not "exist" in the data sets.

Why we should not ignore these gaps

Data gaps, and in particular intersectional data gaps, are more than just a methodological problem. They show whose reality is considered relevant and whose is not. This can lead to undesirable developments and policy measures that systematically disadvantage certain groups and fail to take into account the reality of the lives of those affected. At the same time, there is a risk of missed innovations because diverse perspectives are not included.

How can intersectional data gaps be closed?

The good news is that data gaps are not laws of nature. With the right awareness and methodological care, they can be eliminated or at least significantly reduced.

Diversification of data sets and open science: Data collections should be specifically designed to reflect multiple affiliations. This means not only recording individual characteristics, but also their interaction. Wherever ethically justifiable, data sets should be publicly accessible.
Interdisciplinary research: Intersectional perspectives require cooperation between different disciplines. This can prevent important nuances from being lost.
Involve those affected: Participatory research and design processes in AI help to uncover blind spots and make technology more equitable.
Education and awareness-raising: Students, researchers and developers should be trained in intersectional approaches, also as an ethical responsibility towards society.

Conclusion

Data is often the product of societal decisions. If we ignore data gaps, we not only risk biased research or "unjust" AI. We risk overlooking entire realities of life and missing out on innovation. Especially where discrimination intersects, more attention needs to be paid to intersectional data gaps. Because only those who count fully really count.

Author:

Prof. Dr. Petra Nieken

Institute for Corporate Management
Chair of Human Resource Management