Abstract
Artificial intelligence (AI) has become a revolutionary tool in breast cancer screening, particularly by enabling computer-assisted detection in mammography to help doctors confirm a diagnosis. While AI has shown that radiologists decrease their false positive rates by 37.3% (Yiqiu et al., 2021), it is susceptible to multiple biases that can compromise patient care. In this article, we review these biases, their impact and effective ways to reduce them and improve reliability. This paper discusses: automation bias, which arises from radiologists’ overreliance on AI-based diagnosis and treatment; algorithmic bias, which arises from poorly defined evaluation processes in AI model design; and data bias, which arises from a lack of independent, representative training datasets for various populations influenced by other factors or from a lack of diversity in AI data. We demonstrate, drawing on recent research, how these biases negatively impact clinical outcomes and patient confidence. We then propose expanding the diversity and size of datasets, reweighting and resampling strategies, utilising transparent and explainable AI models, augmenting visualisation models, including a decision referral system for indeterminate cases and providing targeted training for radiologists. Ultimately, rigorous oversight and proactive responses to these challenges will demonstrate the full potential of AI in future cancer detection, while reducing patient bias and enabling diagnosis and treatment that truly benefits all patients.
Introduction
Artificial intelligence (AI) has been integrated into modern medicine from various perspectives. Big data analytics, among other approaches, are replacing traditional individual diagnosis and treatment in current medical practice by aggregating similar cases. This has the potential to improve diagnostic accuracy, shorten workflows and reduce processing time. The principal contribution of AI to the healthcare profession today is supplementing functions that involve immense human experience and computation. For example, it may be used to assist with rapid diagnosis and treatment in volume, straightforward cases. Furthermore, AI has been shown to be a helpful tool in the application of mammography in breast cancer screening but still has some limitations. For the further development of AI in medical research across different disciplines, it is crucial to realise both its potential and limitations; conduct intensive research into training models and address these shortcomings; and actively pursue addressing the lack of trust from patients in human-computer interaction. It has been demonstrated that the current application of AI in mammography interpretation still has some difficulties by bias.
1. Status Quo
New revolutionary scientific discoveries using computer-assisted detection or artificial intelligence models are helping doctors provide their patients with the best and most accurate care. Currently, scientists, engineers and physicians are working together to integrate artificial intelligence models into many aspects of medicine. While these models are definitely revolutionary, there have been recent patterns of bias and issues arising in cases where artificial intelligence is used. In particular, artificial intelligence models are helpful for radiologists to detect cancers in screenings. On the other hand, physicians have noticed specific patterns of bias in artificial intelligence models that help radiologists, all of which affect patient care and results (Abdelwanis et al., 2024) and potentially lead to even the most devastating misdiagnoses. This paper delves deeply into the existence and impact of each bias in mammography for breast cancer, as well as the significance of eliminating bias in ensuring the accuracy and reliability of artificial intelligence in mammography and the resolution of bias in enhancing patients’ perception of the potential of AI in the field of cancer therapy in the future.
1.1 Automation Bias
One type of bias that occurs in mammography screenings and breast cancer detection is automation bias. Automation bias occurs when doctors over-rely on CAD outputs; it stems more from human behaviour than from the device itself (Abdelwanis et al., 2024). It has been shown that automation error is common amongst physicians, especially radiologists. Automation bias is a significant concern in radiology care, already currently leading to issues in diagnosis, treatment and decision-making (Abdelwanis et al., 2024). One such instance occurred when Alberdi et al. (2004) conducted a study investigating automation bias in breast cancer readings. 60 mammography results, with 30 being cancer positive, were shown to two groups of radiologists; one group of 20 radiologists were assisted by AI models and another group of 19 radiologists were unassisted. The findings illustrated that when the AI model presented inaccurate data, many radiologists in the assisted group fell to automation bias and agreed with the model’s inaccurate reading (Alberdi et al., 2004).
A common question that many professionals have concerning automation bias is: “how are trained, experienced professional radiologists overlooking key components of a case by complying with a model’s inaccurate result?”.
Radiologists may fall to automation bias resulting from an increase in workload. If a radiologist is facing a heavy workload that seems daunting, their cognitive capabilities become strained and they may over rely on computer-assisted detection devices to aid their cases (Abdelwanis et al., 2024). A study by Goddard et al. connects perceived workload and errors, suggesting that pressure impacts physicians’ reliance on computer-assisted detection (Goddard, Roudsari & Wyatt, 2012). Therefore, it is extremely important for physicians to find balance between workload and productivity.
Similarly, inexperienced radiologists can easily fall to automation bias due to a lack of confidence; some radiologists find inconsistencies in the AI model’s result but are wary to override an algorithm that is supposedly “perfect”. Additionally, some radiologists are afraid of the professional repercussions they might face if they override an output that turns out to be correct (Abdelwanis et al., 2024). However, Dratsch et al. (2023) conducted a study supporting that inexperienced, moderately experienced and very experienced radiologists are all prone to automation bias with a computer-assisted detection approach (Dratsch et al., 2023).
Finally, radiologists may be subject to automation bias due to uncertainty. Although the computer-assisted detection device has the capacity to help physicians immensely, models typically give little to no input on decision-making processes. Radiologists are unable to trace back the model’s decision for further verification. Instead, radiologists place blind trust in the model’s algorithm without any authentication (Abdelwanis et al., 2024). Not only does this pose serious problems when it comes to physician automation bias, but it symbolises larger problems with the future of AI-assisted healthcare. Most models give no reasoning on their decisions, disabling professionals from easily seeing loopholes, biases and issues.
1.2 Algorithmic Bias
All artificial intelligence models are trained on an algorithm, posing a significant risk of algorithmic bias. This is often due to fundamental algorithmic building blocks which the model uses to reach decisions. Therefore, it is important for computer-assisted detection device developers to go through extensive testing and trials to minimise any fundamental algorithmic bias. The effects of algorithmic bias impact patients greatly, as certain groups of people receive systemically unfair results and outcomes from an AI model (Pham et al., 2024). Algorithmic bias in these models is an umbrella term that can mean a vast amount of root problems. The algorithms acquire fundamental biases due to biases in their algorithmic training, biases in their original training data inputs and subjective decisions during development (Ferrara, 2023).
1.3 DATA BIAS
Data bias directly impact the diagnostic performance of unified AI models across diverse patient populations. Data bias occurs when the datasets used to train and validate AI systems are not representative of the majority patient population or are inadequately adapted to the diversity of patients. This can look like an undiverse representation of the patient population, which impacts an artificial intelligence model’s training. Currently, this unfairly impacts entire groups of people. If an artificial intelligence model is trained well with one specific group of patients, but does not have enough data for another, a whole group of people will see discrepancies in their results (Pham et al., 2024). This issue is systemic, and often impacts minority groups the most.
It is important to feed artificial intelligence models diverse datasets for many aspects of a patient’s profile: geographic location, socioeconomic status, ethnicity, race and environment (Pham et al., 2024). During AI training, large amounts of repetitive training data are memorised. This data, which contains fixed characteristics of a specific population, can lead to inherent biases in the AI’s future diagnosis and treatment of patients with different characteristics. AI training involves memorising large amounts of historical diagnosis and treatment data and then computationally analysing this data for mutations, trends and correlations (Coherent Solutions, 2025).
Therefore, when AI models are used to detect breast cancer, the population with the largest amount of data representation – typically Caucasian, middle-aged and older women – will be trained to produce more effective and accurate diagnostic results for this demographic. Consequently, AI-powered diagnosis and treatment have a natural advantage in detecting and diagnosing diseases related to this population. Data is significantly underrepresented for other groups with low representation in breast cancer treatment databases, such as young African American and Asian women, and even men (Jacqueline, 2012). This data bias results in AI having higher accuracy rates for overrepresented groups but poorer performance for underrepresented groups, leading to systemic inequalities in diagnosis, treatment and care. Currently, data bias contributes to patient distrust of AI in mammography screening by reducing the accuracy of the technology in specific populations. Addressing data bias will be a crucial issue in the next phase of large-scale AI model training.
2. Effects of Bias
AI biases in breast cancer mammography readings can have devastating effects when AI systems are utilised in a clinical patient environment. As previously mentioned, there are various types of bias which we can observe; each bias type can lead to distinct consequences and misdiagnoses. Therefore, it is important to understand and accept that each bias type requires a specific and different mitigation strategy. The elimination of bias is vital to confirm the accuracy and reliability of AI in mammography readings. Therefore, as the use of AI proliferates, a deep understanding of the existence and effect of each type of bias in breast cancer mammography readings is required for optimisation and implementation.
2.1 Effects of Automation Bias
Novel technology in the healthcare sector needs to be properly understood by medical staff who may be operating it before its implementation and broader acceptance by the medical community. The training of medical professionals on how new technology works not only informs the operators, but also instils trust from the patients that their providers are capable. Regarding tasks such as mammography reading detection and diagnosis, the use of AI for this particular role will inevitably grow in the coming years. This means that operators, in this case radiologists, will need the necessary training and will need to become proficient in the technology before it can be widely adopted. When there is an issue with the way radiologists understand and utilise artificial intelligence, it can be a source for automation bias.
In the study conducted by Dratsch et al., inexperienced mammography readers were shown to be most susceptible to bias as they would continue to believe AI outputs even if they were inaccurate. The results showed a mean degree of bias of 4.0 ± 1.8 compared to 2.4 ± 1.5 for moderately experienced and 1.2 ± 0.8 for very experienced radiologists. The study concluded that automation bias significantly impacts diagnostic performance across all experience levels. Given the repetitive and highly standardised nature of mammography screening, automation bias may become a concern when an AI system is integrated into the workflow (Dratsch et al., 2023).
Automation bias in AI-assisted breast cancer mammography has been proven to cause untrained radiologists to rely too heavily on the predictive model, leading to a loss of independent judgement. This makes it harder for the radiologists to intervene when the AI is wrong. This over-reliance can result in errors such as missed or false cancer diagnoses, directly affecting patient outcomes. This can reduce patient confidence in both the technology and the healthcare system as a whole. If patients feel that human expertise is being overshadowed by a machine, they may begin to question the fairness and safety of their care, which can reduce trust and willingness to accept the screening and treatment processes.
2.2 Effects of Algorithmic Bias
When AI is used for medical imaging, such as breast cancer mammography, the specific way an algorithm is trained and evaluated can drastically change the result.
In a recent (2024) review by Pham et al, the role that artificial intelligence plays in the diagnosis, prediction and treatment of head and neck cancers was explored. Using databases from the National Library of Medicine, the article aims to address limitations associated with AI integration. The review explains how artificial intelligence models are able to identify patterns when shown a medical image of a patient, leading to their assistance in diagnosis. The complexity in training these AI models comes from the fact that small differences in diagnostic images can have great implications for the detection of cancer. AI models need to learn to recognise these minute details and differentiate between information that is often distorted or altered due to the way they are shown in digital, pixelated forms. The study mentions that this machine learning is not without bias. One limit, which is described in the review, references the lack of uniformity in the techniques which the algorithms are trained with. Additionally, they explain that subjective decisions in the model development process, such as feature selection and parameter tuning, introduce algorithmic bias in certain algorithms (Pham et al., 2024).
As shown in Pham’s review, algorithmic bias is prominent in the diagnosis sector. The effects of algorithmic bias on patient outcomes includes a reduced quality of care and an erosion of trust in the model. If an AI model has not been properly trained and is prone to bias, then it is more likely that a patient will receive a delayed or incorrect diagnosis. Additionally, if patients learn that there is a chance of bias within the AI model, the trust which they have put into the algorithm disappears. Patients and clinicians may lose confidence in AI if it repeatedly fails to deliver.
2.3 Effects of Data Bias
Regarding breast cancer, data bias can specifically arise when AI is trained on certain breast densities more than others. When the AI is not properly trained, it can perform less accurately, potentially missing cancers or producing false positives. Addressing data bias is critical, as it directly affects the equity of AI-assisted mammography, and failure to mitigate it can prolong issues in breast cancer diagnosis and treatment.
Koo et al.’s recent study (2023) determined whether or not AI systems showed signs of data bias in its analysis of breast ultrasounds. For this study, the researchers utilised breast lesion information from nine breast imaging facilities across the US, thereby creating an ethnically and racially diversified database. They would then compare actual pathology results, determined by human pathologists, to those of an FDA-approved AI system for analysis and diagnosis of ultrasound screenings. The study was looking to establish if AI’s performance across various demographic groups and facilities mirrored the results from the pathological analysis. An unexpected outcome of the study was that the AI assessments often differed from the pathology, specifically underperforming for patients in underrepresented populations, such as African Americans and Hispanics (Koo et al., 2023). This is an example of data bias and demonstrates an inherent limitation of AI systems in which bias is created based on the limitations of the dataset used to train the AI model. While this study focused on ultrasounds, these findings are highly relevant to mammography, since mammography AI models are also trained on large datasets.
As shown in Koo et al.’s study, data bias leads to uneven accuracy across different patient populations. This means that some groups experience missed cancers or unnecessary recalls. This will in turn delay diagnoses and worsen health disparities. If the AI is consistently underperforming, then underrepresented groups will see a continuous decline in care quality, reinforcing existing inequities in healthcare outcomes.
3. Proposed Solution
It is critically important to address AI biases in mammography screening. These biases can have serious consequences, including missing cancers with false negative rates and costly consequences due to unnecessary testing for false positive rates. As we have shown, AI bias in mammography screening could be due to data bias, measurement bias, algorithmic bias and cognitive bias. We propose the following solutions to reduce these false rates caused by biases.
We address data bias by improving diversity in our datasets by including different races and ethnic backgrounds for training the data. This will require obtaining mammography data from multiple institutions and screening programmes, both nationally and internationally, so that our dataset reflects the variability of real-world populations. In addition to images, demographic features such as age, race, ethnicity and family history will be included as structured metadata. These clinical features will be integrated with the imaging data by adopting a multimodal framework: convolutional neural networks (CNNs), a class of deep learning models designed to recognise spatial hierarchies and fine-grained features in images, will process the mammography scans, while demographic information will be encoded as feature vectors and combined with the imaging embeddings at the fully connected layers of the model. CNNs are particularly well-suited for mammography because they can detect small patterns such as calcifications and subtle masses that may indicate malignancy. McKinney et al. (2020) showed that an AI system trained and evaluated across UK and US sites achieved an absolute reduction in false negatives of 9.4% in the US and 2.7% in the UK and a reduction in false positives of 5.7% in the US and 1.2% in the UK, demonstrating improved sensitivity and specificity thanks to multisite diversity. In a similar way, by working with a larger and more diverse dataset, our model will be able to learn from differences in race, ethnicity and risk factors to reduce data bias.
We address measurement bias by obtaining data from multiple institutions, such as academic hospitals, community clinics and different international sites. By obtaining data from multiple institutions, we not only address racial and ethnic issues, but also address how the images are obtained and processed. Different mammography machines, settings and labelling standards can all influence the quality and interpretation of images. To reduce these differences, we propose using standardised preprocessing pipelines that normalise image intensity, resolution and contrast. In addition, we can explore domain adaptation techniques such as histogram matching to reduce systematic variation between data sources. Annotation bias will also be reduced by including multiple radiologists in the labelling process, assessing inter-rater reliability and building consensus-driven labels. This way, we address how the data is measured and annotated, not just how it is distributed across groups.
We address algorithmic bias by using reweighting or resampling techniques so the model does not overfit the majority groups through bias-awareness training. Specific approaches include stratified oversampling of underrepresented cases, inverse propensity weighting and loss functions that explicitly penalise subgroup disparities. For our models, we will rely on CNN backbones pre-trained on large datasets and then fine-tune these models with our mammography dataset. We will also evaluate ensemble learning as a way to stabilise performance and reduce subgroup variance. Our solution will include continuous monitoring, bias audits and model retraining as key strategies to reduce algorithmic bias. Lång et al. (2020) demonstrated that by re-reading mammograms where cancers were initially missed, a reweighting based on radiologist-driven labels led to improved sensitivity in high-risk groups. This evidence supports our proposed use of reweighting and subgroup-aware training methods.
Finally, cognitive bias can plague all reading strategies, including AI models, human readers and human readers who also use AI models. Our solution to this is to apply a decision-referral system where humans handle all uncertain cases and AI provides an explainable output instead of a short answer. This encourages critical thinking by the radiologist and reduces overreliance on automation. Yala et al. (2019) demonstrated that when AI refers uncertain cases to the radiologist, it leads to reduced anchoring bias. To reduce confirmation bias, Arun et al. (2021) showed that visual explanations such as heatmaps or saliency maps help the radiologist understand why the AI flagged or ignored a region. This encourages the radiologist to critically review the flagged region rather than automatically accepting the AI’s conclusion. In addition, we propose that the AI system provide calibrated risk scores (for example, 1–10) instead of binary outputs. In the Mammogram Screening with Artificial Intelligence (MASAI) trial in Sweden, they used some of the strategies we propose. With the MASAI protocol, radiologists always reviewed images with AI enhancement rather than replacing human decision-making, leading to an overall reduction in automation bias. In the MASAI model, the AI system provided a risk score of 1-10 that helped radiologists calibrate their level of vigilance and not just anchor to an AI output, especially for ambiguous cases (Lång et al., 2023).
In summary, our solution to reducing the biases has to be a multi-step approach: First, work with a more expanded dataset and continue to grow, learn and improve from the larger dataset; second, use reweighting and resampling to improve the sensitivity and specificity of the results; third, use a decision-referral system to send the radiologist uncertain studies; and fourth, send the results with a type of visual “heatmap” or saliency map to help the radiologist understand why the AI flagged or ignored a region to encourage critical review. Our solution still requires training of the radiologist to understand the limits and biases of an AI approach so they can more efficiently and effectively screen patients. Together, these steps provide a path toward an AI system that is both accurate and fair, while remaining a trusted tool in the hands of clinicians.
Conclusion
Ultimately, while AI has the potential to revolutionise early detection and treatment of breast cancer, the presence of bias in training data, algorithms or model design poses significant risks to the accuracy and fairness of diagnostic outcomes for many patients. Bias, whether stemming from foundational systematic errors within the AI system (algorithmic bias) or from misconceptions and inaccuracies in the datasets used to train the AI systems (data bias), can lead to significant disparities in healthcare access, misdiagnoses and potentially inadequate treatment plans, particularly for marginalised groups. These inequities further undermine the potential benefits of AI and aggravate existing health disparities, consequently widening the gap between different demographic groups. Therefore, the bias might be addressed through improved data diversity, thorough validation methods and transparent AI machine learning to ensure that AI technologies in breast cancer research contribute equally to patient care. Only by confronting these challenges can AI live up to its promise of transforming breast cancer diagnosis and treatment in a way that benefits all patients, regardless of their background.
Bibliography
Abdelwanis, M., Alarafati, H.K., Tammam, M.M.S. & Simsekler, M.C.E. (2024). Exploring the risks of automation bias in healthcare artificial intelligence applications: A Bowtie analysis. Journal of Safety Science and Resilience, 5(4), pp. 460–469.
Alberdi, E., Povyakalo, A., Strigini, L. & Ayton, P. (2004). Effects of incorrect computer-aided detection (CAD) output on human decision-making in mammography. Academic Radiology, 11(8), pp. 909–918.
Arun, N., Gaw, N., Singh, P., Chang, K., Aggarwal, M., Chen, B., Hoebel, K., Gupta, S., Patel, J., Gidwani, M., Adebayo, J., Li, M.D. & Kalpathy-Cramer, J. (2021). Assessing the Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging. Radiology: Artificial Intelligence, 3(6).
Coherent Solutions (2025) AI in Big Data: Uses Cases, Applications, and Benefits. Coherent Solutions [online]. <https://www.coherentsolutions.com/insights/ai-in-big-data-use-cases-implications-and-benefits>
Dratsch, T., Chen, X, Mehrizi, M.R., Kloeckner, R., Mähringer-Kunz, A., Püsken, M., Baebler, B., Sauer, S., Maintz, D. & Dos Santos, D.P. (2023). Automation Bias in Mammography: The Impact of Artificial Intelligence BI-RADS Suggestions on Reader Performance. Radiology, 307(4).
Ferrara, E. (2023). Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. Sci, 6(1).
Goddard, K., Roudsari, A. & Wyatt, J.C. (2012). Automation bias: a systematic review of frequency, effect mediators, and mitigators. Journal of the American Medical Informatics Association, 19(1), pp. 121–127.
Koo, C., Yang, A., Welch, C., Jadav, V. & Posch, L. (2023). Validating racial and ethnic non-bias of artificial intelligence decision support for diagnostic breast ultrasound evaluation. Journal of Medical Imaging, 10(6).
Lång, K., Dustler, M., Dahlblom, V., Åkesson, A., Andersson, I. & Zackrisson, S. (2020). Identifying normal mammograms in a large screening population using artificial intelligence. European Radiology, 31, pp. 1687–1692.
Lång, K., Josefsson, V., Larsson, A.-M., Larsson, S., Högberg, C., Sartor, H., Hofvind, S., Andersson, I. & Rosso, A. (2023). Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): a clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. The Lancet Oncology, 24(8), pp. 936–944.
McKinney, S.M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado, G.S., Darzi, A., Etemadi, M., Garcia-Vicente, F., Gilbert, F.J., Halling-Brown, M., Hassabis, D., Jansen, S., Karthikesalingam, A., Kelly, C.J., King, D. & Ledsam, J.R. (2020). International evaluation of an AI system for breast cancer screening. Nature, 577(7788), pp. 89–94.
Miller, J.W., King, J.B., Joseph, D.A. & Richardson, L.C. (2012) Breast cancer screening among adult women–Behavioral Risk Factor Surveillance System, United States, 2010. MMWR Suppl, 61(2), pp. 46-50.
Pham, T.D., Teh, M., Chatzopoulou, D., Holmes, S. & Coulthard, P. (2024). Artificial Intelligence in Head and Neck Cancer: Innovations, Applications, and Future Directions. Current Oncology, 31(9), pp. 5255-5290.
Shen, Y., Shamout, F.E., Oliver, J.R., Witowski, J., Kannan, K., Park, J., Wu, N., Huddleston, C., Wolfson, S., Millet, A., Ehrenpreis, R., Awal, D., Tyma, C., Samreen, N., Gao, Y., Chhor, C., Gandhi, S., Lee, C., Kumari-Subaiya, S., Leonard, C., Mohammed, R., Moczulski, C., Altabet, J., Babb, J., Lewin, A., Reig, B., Moy, L., Heacock, L. & Geras, K.J. (2021). Artificial intelligence system reduces false-positive findings in the interpretation of breast ultrasound exams. Nat Commun, 12(1).
Yala, A., Schuster, T., Miles, R., Barzilay, R. & Lehman, C. (2019). A Deep Learning Model to Triage Screening Mammograms: A Simulation Study. Radiology, 293(1), pp. 38–46.