Abstract

Breast cancer remains the most common cancer among women worldwide, with metastatic breast cancer and interval breast cancer escalating the mortality rates due to their resistance to early detection. Developing accurate predictive and diagnostic measures requires an understanding of the mechanics, risk factors and traits unique to breast cancer. These advancements are further supported by integrating artificial intelligence (AI) technology into the analysis of images taken by existing screening methods, such as tomosynthesis and mammography. Despite promising AI technology for predicting interval cancer and metastasis, current systems lack a generalised and accurate model due to the aggressive histopathological features of these clinical categories. This proposal analyses the connection between interval cancer and metastasis, highlighting the limitations of current models to develop an AI rarity-based screening tier classification model. This model assesses genetic markers, structured patient data, imaging and unstructured clinical notes to determine the rarity of a patient’s case based on several datasets globally, ranging from A (low rarity) to E (highest rarity). The tiering system reflects both statistical rarity and clinical urgency, while incorporating safeguards to ensure fairness and interpretability. By implementing these strategies, cases of interval cancer and metastatic breast cancer may be detected promptly, diminishing some of the threatening implications that they hold.

1. Introduction

Breast cancer is the most common cancer worldwide, accounting for around 1 in 8 cancer diagnoses (Arnold et al., 2022). Around 85% of breast cancer cases are adenocarcinoma cases, where the cancer forms in the breast ducts, with the remaining 15% originating from the lobular epithelium, which produces milk as a result of lactation (Katsura et al., 2022). Nearly 30% of women diagnosed with stage 0 or 1 breast cancer will go on to develop metastatic breast cancer, and this usually occurs when carcinoma cells enter the circulatory or lymphatic systems (National Breast Cancer Foundation, 2025). The main blood supply to the breasts is the internal mammary artery, located below the breast tissue. The lymphatic vessels drain into lymph nodes, with the sentinel lymph node often being the first site of cancer metastasis (Johns Hopkins Medicine, 2022). These mutations result from multiple risk factors, including genetics, lifestyle, reproductive stages and hormonal factors (Sun et al., 2017).

In recent decades, screening and diagnostic technology have improved exponentially (Pulumati et al., 2023). However, detection for difficult subcategories of cancer, such as metastatic breast cancer and interval breast cancer, remains underdeveloped. Currently, the detection of metastatic breast cancer is often suspected through clinical observations of the spread of the tumour in distant areas of the body and then followed up with molecular and imaging diagnostics. In addition, according to a study, approximately 20-25% of interval breast cancers go undetected (Nguyen et al., 2019). Utilising artificial intelligence (AI) assisted programs to observe subtle abnormalities in breast tissues to detect interval breast cancers and metastasis before significant progression will significantly improve the prognosis and detection. This AI approach involves utilising the unique genetic characteristics, tumour profiles, biological drivers and various others to create a tier-based classification assessing the patient’s risk of metastatic and interval breast cancer. 

This proposal aims to analyse the positive and negative aspects of the current detection and prediction of interval and metastatic breast cancer, as well as to explore the use of AI in classifying and aiding radiologists in distinguishing these subtypes through the use of a tier-based classification system.

1.1 Metastasis

1.1.1 What is Metastasis?

Metastasis is the process by which cancer cells from the primary tumour spread to other parts of the body, either via the lymphatic or circulatory system, leading to the subsequent formation of secondary tumours at other organs across the body (Gerstberger et al., 2023). Within metastasis, there are three main phases. The first is dissemination, where tumour cells with specific mutations promoting growth and proliferation invade surrounding tissue, followed by their eventual intravasation into blood or lymphatic vessels. Following this invasion, extravasation to distant organs occurs, often through transendothelial migration. This is where tumour cells attach to endothelial cells, degrade the vessel’s membranes to increase permeability and finally move between or through endothelial cells to enter a new microenvironment. This stage is followed by a period of dormancy, where cells can either exit the cell cycle or undergo short periods of proliferation, often resulting in immune detection. However, if cells accumulate enough mutations and adapt to the new microenvironment, they can initiate colonisation, at which point a new tumour forms and evades the immune system. At the basis of metastasis lies the genome of every cell, an array of accumulated mutations and metabolic changes that determine how a carcinoma cell invades, survives and acclimates to its new environment (Gerstberger et al., 2023).

Metastasis is responsible for >90% of total tumour-related deaths, and those with interval cancer are three times more likely to die compared to those with screen-detected cancers (Riggio, Varley & Welm, 2021; DePolo, 2022). These high-risk features of metastatic cancer represent the leading cause of breast cancer mortality. According to the National Cancer Institute’s Dictionary of Cancer Terms, metastasis is the spread of cancer cells from its original site to another part of the body (National Cancer Institute, 2025). Interval cancer is a high-grade, aggressive tumour, diagnosed between the preliminary screening and the following routine screen (Schnipper, 2020). Interval cancers have various features that contribute to their unpredictable and aggressive nature, including the masking effect, rapid growth rates and tumour characteristics such as the triple negative phenotype, HER2 positives, Ki-67 and TFF1 markers (Crosier et al., 2001; Musolino et al., 2018; Dewi et al., 2022; O’Reilly, Sendi & Kelly, 2021). Due to the high mortality rate associated with metastasis and interval cancer, studying the characteristics and causes of metastasis will improve screening, diagnosis and predictive methods. Interval cancer’s strong correlation with both nodal and distant metastasis further worsens the implications and mortality rate (Aki Nykänen et al., 2024).

1.1.2 The Genetic Basis of Cancer

Cancer occurs through a series of random mutations in a cell’s genome, leading to uncontrolled growth, division, proliferation and eventually the formation of a tumour. There are many theories to explain carcinogenesis, the most widely accepted being the Somatic Mutation Theory, wherein the accumulation of mutations in a non-reproductive cell leads to malignancy (Sonnenschein & Soto, 2008). Several observations support this theory: cancer is common in older people, implying that the accumulation of mutations occurs over decades (Vogelstein & Kinzler, 1993). Similarly, patients exposed to radiation therapy only develop cancer after a long time lag. Radiation therapy often causes a spike in mutations and genomic instability due to its ionising nature. After the treatment itself ends, the rate at which mutations occur within the body resumes its natural level; however, a time lag is often observed before cancers develop due to the accumulation of additional mutations and their gradual development to evade immune response (Vogelstein & Kinzler, 1993).

COPY NUMBER ALTERATIONS

Copy number alterations (CNAs) are random mutations within a cell’s genome that can affect the way the carcinoma cell can adapt and resist immune response and drugs. The first type of alteration is an amplification, where the copies of a specific gene increase to promote growth; for example, the amplification of growth-promoting ERBB2 and FGFR1 genes occurs in most cases of breast cancer (Kadota et al., 2009). The second alteration is the deletion of tumour suppressor genes (TSGs). There may also be entire genome amplification or deletion, at which point significant genome instability occurs, leading to poor prognosis and recovery (Shahrouzi et al., 2024). Finally, focal CNAs target specific regions within the genome, such as growth-promoting genes (Shahrouzi et al., 2024). Altogether, these genetic changes promote adaptability in cancer cells, enabling them to thrive in extreme climates and grow exponentially. 

TUMOUR SUPPRESSOR GENES AND THEIR ROLE IN MALIGNANCY

Tumour suppressor genes are the first mechanism to prevent tumorigenesis. These genes act as negative regulators, where TSG proteins inhibit the growth and survival of cells with damaged DNA (Cooper et al., 2000). When these genes are inactivated, no harmful regulation occurs, resulting in the unrestrained production of oncoproteins, which trigger relentless cell proliferation and division. Examples of important TSGs include the BRCA1 and BRCA2, both involved in DNA repair and the RB gene, which halts the cell at the restriction point when a faulty gene is detected to prevent synthesis from occurring (Cooper et al., 2000). Therefore, inactivation of these genes removes any hurdle for further cell cycle progression.

The most widely researched gene, the p53 gene, involves cell cycle arrest and apoptosis. During a mutation, the p53 gene is activated to assess the extent of the damage. When damaged DNA is identified and deemed repairable, cell cycle arrest is induced, where time is permitted during the G0 phase’s resting state for mending damaged genes. If the mutations are too severe, p53 triggers either apoptosis, a process of programmed cell death, or senescence, an indefinite period during which cells do not divide or die. The deletion of this gene proves to be central to many cancers; apoptosis and senescence are the body’s first defence against cancer. If apoptosis-inducing genes, such as p53, are inactivated, complications begin to arise (MedlinePlus, 2020).

1.1.3 Unregulated Growth, Survival and Metastasis

THE WARBURG EFFECT AND METABOLISM

The Warburg effect, first identified by Otto Warburg in the 1920s, refers to the fundamental metabolic shift in tumour cells that facilitates their growth and survival (Liao et al., 2024). Glycolysis, a metabolic pathway for respiration, is more commonly used in tumour cells, even in oxygen-rich environments, despite its inefficiency. The byproduct of this process, lactate, does not simply accumulate; instead, it influences and suppresses the immune response in surrounding tissue, thereby further sustaining tumour growth (Hanahan & Weinberg, 2011). This lactate can also fuel angiogenesis, which in turn delivers a constant supply of growth-enhancing nutrients to carcinoma cells, supporting proliferation (Liao et al., 2024). In essence, the Warburg effect is a powerful driver of malignancy and enables tumours to thrive, evade the immune system and reshape the microenvironment to their advantage. 

TELOMERE REPAIR

One of the hallmarks of cancer, replicative immortality, is sustained through continuous telomere repair (Hanahan & Weinberg, 2011). As a cell continuously grows and divides, each time cell replication occurs, a small amount of telomeric DNA is lost. When telomeric DNA reaches critically low levels, apoptosis is triggered. To avoid this limitation, about 80% of cancerous cells reactivate telomerase, an enzyme that restores telomeric DNA and prevents it from reaching critically short lengths (Hanahan & Weinberg, 2011). This reactivation ensures cancer cells replicate and pass on mutations indefinitely. 

ADHESION AND INTRAVASATION

The loss of the adhesion molecule E-cadherin in malignant carcinoma cells is a key characteristic of epithelial-mesenchymal transition (EMT) (Hanahan & Weinberg, 2011). EMT involves the loss of E-cadherin, a key adhesion and organisation enabling molecule found in epithelial cells, the loss of which allows intravasation into the circulatory and lymphatic systems to occur. Eventually, cells embed in distant organs and tissues, forming new tumours. Without the loss of this key adhesion molecule, the buildup of carcinoma cells would not be enabled, nor would the spread of these cells to other parts of the body; this is a key part of dissemination, the first phase of metastasis (Kiri & Ryba, 2024).

IMMUNE EVASION AND SUPPRESSION

The immune response shields the body from infections and disease, yet relies entirely on the receptor-ligand binding between T-cells (Cooper et al., 2000). Therefore, the rapid recognition and decisive action of the immune system are vital in the early stages of cancer to avoid metastasis and further complications. Initially, immune cells recognise foreign particles through the comparison of molecular patterns of antigens found on the surface of the cells (Cooper et al., 2000). These include natural killer cells and macrophages, which initiate this immune response. However, as cancer cells are continually found and eliminated, tumour cells begin to develop T-cell tolerance, purely through immune pressure and natural selection (Kim & Cho, 2022). Alteration of antigen presentation to avoid the transmission of recognisable signals, expression of ligands to tire T-cells and metabolic suppression through the exhaustion of nutrients and amino acids are some of the ways cancer cells work to evade and suppress the immune response (Kim & Cho, 2022). Additionally, the accumulation of lactic acid resulting from glycolysis is toxic to T-cells, further dampening the immune response. As a result, widespread immunoediting occurs, where resistant clones survive and pass on specific mutations until an entire population has immune resistance (Kim & Cho, 2022; Kiri & Ryba, 2024).

1.1.4 The Effect of Age, Lifestyle and Genetic Factors on Breast Cancer Progression

Various factors contribute to an increased risk of breast cancer, such as ageing, reproductive factors, exogenous and endogenous hormones, lifestyle, oestrogen and certain genes.

AGEING

Breast cancer is most prevalent in women between the ages of 40 and 60. In 2016, approximately 99.3% and 71.2% of all breast cancer-associated deaths in America were found to be in women over the ages of 40 and 60, respectively (Siegel et al., 2017). As women age, their risk of developing breast cancer increases, which underscores the need for efficient and accurate screening methods to enable early detection and improve treatment outcomes.

REPRODUCTIVE FACTORS

Primary reproductive stages, such as menarche, menopause and parity, play large roles in breast cancer risk. A study carried out between 2003-2009 concluded that women whose breasts started developing before the age of 10 and who had their first menstrual period before the age of 12 had a 30% greater susceptibility of breast cancer than women with neither risk factor (Goldberg et al., 2020). Having an early thelarche extends the period in which the breast cells are in a highly proliferative and undifferentiated state, increasing the likelihood of genetic mutations, thereby heightening the predisposition to breast cancer. Two more significant reproductive factors are the age at first birth and parity. A meta-analysis of 8 population-based studies concluded that women who have never given birth experience 30% higher probability of breast cancer compared with those who have, while for every two additional births lowers the vulnerability by roughly 16% (Ewertz et al., 1990). During pregnancy, high levels of oestrogen and progesterone encourage the differentiation of epithelial tissue in the breast, reducing the number of cells prone to oncogenesis (Russo et al., 2005).

EXOGENOUS AND ENDOGENOUS HORMONES

Endogenous and exogenous sex hormones, such as oestrogen, are both positively and negatively associated with breast cancer risk (Chen, 2008). Exogenous oestrogen, found in hormone replacement therapy (HRT) and oral contraceptives, is proven to increase risks of breast cancer, where it has been estimated that the use of HRT among women aged 50–64 in the UK has led to approximately 20,000 additional cases of breast cancer (Million Women Study, 2003). 

LINKS BETWEEN OESTROGEN AND METASTASIS

The majority of breast cancers are oestrogen receptor-positive (ER+). Tumour growth relies heavily on oestrogen signalling: oestrogen binds to the oestrogen receptor (ER) and the activated ER acts as a transcription factor, a protein that binds to specific DNA sequences. The activated ER binds to oestrogen response elements (EREs) inside the nucleus and drives the expression of genes that allows cancer cells to proliferate, avoid apoptosis and metastasise (Saha Roy & Vadlamudi, 2012).

LIFESTYLE

Alcohol intake and smoking correlate with breast cancer rates (Breast Cancer Now, 2024). A meta-analysis of 53 studies found that women who consumed alcohol had a 32% higher relative risk of developing breast cancer compared with those who did not drink (McDonald et al., 2013). Alcohol’s breakdown into acetaldehyde contributes to this elevated risk, which can damage DNA, as well as alcohol’s effects on the concentrations of oestrogen, which is known to influence breast cancer. A meta-analysis of prospective cohort studies shows a positive association with smoking and breast cancer. Due to carcinogens in cigarette smoke, containing over 7,000 toxic chemicals, these compounds damage DNA and increase the risk of breast cancer (Poorolajal et al., 2021; Hamer & Warner, 2017). 

GENETIC FACTORS

Mutations in the BRCA1 and BRCA2 genes are the most commonly linked alterations to breast cancer (Johns Hopkins Medicine, 2022). These genes are significant in DNA repair and are fundamental in determining whether someone is at risk of breast cancer; more than 60% of women who inherit a change in BRCA1 or BRCA2 will develop breast cancer (National Cancer Institute, 2024).

Despite progress to address breast cancer cases and risks, specific subgroups, like interval cancer, remain a particularly aggressive subset, often missed by screenings due to their high proliferation and aggressive characteristics. Interval cancer and metastasis are the two factors contributing to death caused by breast cancer. As such, understanding of metastasis’ interaction with the characteristics of interval cancer is the key to improving early detection and prediction strategies.

1.2 Interval Cancer’s Defining Features and Characteristics

According to a recent study, women have a higher risk for aggressive disease and death after an interval breast cancer diagnosis (Song et al., 2024). Interval cancer is a high-grade and aggressive tumour diagnosed between a double-screening time frame; a mammogram will appear normal in the preliminary screening, and the next screening will reveal an unexpected lump (Schnipper, 2020). 

Interval cancer is classified into four groups: true interval, occult cancer, false negatives and minimal signs (Messinger et al., 2018). True intervals are those not present at the time of screening or that showed benign characteristics in the preliminary screening. Occult cancers occur when cancer is present but no abnormalities appear on the mammogram. False-negative cancers are often overlooked malignancies due to errors by radiologists, AI systems, subtle features or dense tissue (Houssami & Hunter, 2017). Minimal sign cancer develops faint abnormalities at the first screening, but its slow development and subtle features lack the characteristics deemed to be malignant or certainly benign (Messinger et al., 2018).

1.2.1 Detection Limitations

THE MASKING EFFECT

In breast cancer screening, breast density refers to the proportion of light, radio-opaque parenchymal tissue and dark, radio-lucent fatty tissue. A darker mammogram is composed of fatty tissue, allowing white, cancerous areas to stand out clearly. Lighter areas on a mammogram represent dense, fibroglandular tissue (Nguyen et al., 2019). Since tumours and dense tissue both appear white, high-density breast tissue increases the risk of breast cancer going undetected. Statistically, women with greater breast density, >75% density, have higher risks of interval cancer due to the limitations in detecting tumours in dense tissues (Nguyen et al., 2019). 

1.2.2 Biological Drivers

RAPID AND AGGRESSIVE GROWTH

Interval cancers have reduced apoptotic cells, which is the process of programmed cell death. This reduction in cell death contributes to rapid growth rates (Gilliland, 2000; National Human Genome Research Institute, 2025). Since fewer cells undergo apoptosis, tumours with a high proportion of proliferating cells are four times more likely to be missed than tumours with a low proportion of proliferating cells in a mammogram. These features, like high proliferation, lower levels of apoptosis and other molecular features, drive its elusive and subtle patterns (Gilliland, 2000). This fast-growing tumour appears ill-defined in early mammogram screenings compared to its counterpart, a slow-growing tumour, which develops visible features in its early stages. As a result, fast-growing tumours outpace screening, appearing larger and more advanced at the next examination (Gilliland, 2000). 

1.2.3 Key Tumour Profiles

TRIPLE NEGATIVE BREAST CANCER CHARACTERISTIC

The majority of interval cancer breast cases are disproportionately triple negative breast cancer characteristic (TNBC) compared to screen-detected tumours: they may express absent ER, progesterone receptor (PR) and human epidermal growth factor (HER2). ER and PR are hormone receptors that bind to their respective proteins, signalling cells to grow and divide (Cleveland Clinic, 2025; Kass, 2025). HER2 is a protein receptor on the surface of breast cells that regulates growth (American Cancer Society, 2022). When all these growth receptors disappear, TNBC develops into more aggressive and invasive tumours. Without the regulation of hormone signalling, tumours rely on irregular pathways, such as wingless-related integration site/beta-catenin (Wnt/β-catenin), a signalling route for cancer cells to regrow and spread, and nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB), a path for immune evasion (Dewi et al., 2022). These alternative cascades can be associated with mutations that drive uncontrollable growth (O’Reilly, Sendi & Kelly, 2021). 

HER2 POSITIVE

HER2’s overexpression is apparent in interval cancer. Those with HER2-positive interval cancer experience worse prognosis and more aggressive tumour growth (Domingo et al., 2014). HER2-positive interval cancer contributes to the aggressiveness of its biology, resulting in high proliferation rates and genomic instability (Musolino et al., 2018). 

KI-67

Ki-67 is a marker of cell proliferation, measuring the percentage of tumour cells growing and dividing. A recent study explains that interval cancers are characterised by high levels of Ki-67 expression (Crosier et al., 2001). This study suggests that cell migration and cell division play important roles in the rapid progression of interval cancers. As such, interval cancers typically display higher Ki-67 levels than screen-detected cancers due to the prevalence of triple-negative and HER2-positive subtypes that increase their proliferation (Nykänen et al., 2024). Ki-67 is a strong factor distinguishing the aggressiveness of screen-detected cancers, suggesting faster growth and a likelihood of nodal or distant metastasis. The effects of Ki-67 explain the aggressive biology and higher risks of metastasis compared to screen-detected tumours (Nykänen et al., 2024). 

Despite advancements in mammographic review, interval cancers remain difficult to predict. While AI-supported mammogram screening detect significantly more cancer cases, interval cancer rates remain a blind spot to both AI and radiologists, as interval cancer remains invisible due to its early and subtle morphology or dense tissue.

1.3 How Metastasis and Interval Cancer Connect

Interval cancer is associated with a higher risk of nodal and distant metastasis (Nykänen et al., 2024). Nodal metastasis, defined as the presence of positive lymph node cancer, was identified in 53.6% of missed interval cancer. Similarly, distant metastasis occurred in 24.5% of true interval cancer compared to 1% of screen-detected cancer cases (Nykänen et al., 2024). Interval cancers’ histopathological influences and aggressive characteristics, such as higher Ki-67 levels, HER2-positive tumours and greater cases of TNBC, explain the strong correlation with metastasis. 

1.3.1 Tumour Profiles in Interval Cancer Driving Metastasis 

HIGHER CASES OF TNBC INTERVAL CANCER

Since interval cancer is disproportionately triple-negative compared to screen-detected cancers, the absence of HER2, ER and PR contributes to the biological aggressiveness, detection issue and metastasis of interval cancer (O’Reilly, Sendi & Kelly, 2021). The absence of these three hormones disrupts any therapies to control cancer growth, resulting in tumour cells using unregulated and alternative pathways, such as NF-kB or Wnt/β-catenin, which can block immune attacks and increase proliferation. These pathways create a pro-metastatic environment, increasing the risk of breast cancer to invade, grow and eventually metastasise; at least 25% of patients with TNBC develop metastasis (O’Reilly, Sendi & Kelly, 2021). 

HIGHER KI-67 LEVELS

Ki-67 is a protein found in dividing cells. High Ki-67 levels indicate quick cell division and a higher likelihood of spread (National Cancer Institute, n.d.). This protein’s expression correlates with a risk factor for recurrence and metastasis in breast cancer. In a study on the role of Ki-67 in HR+/HER2 breast cancer, researchers used ROC analysis to quantify the two categories and predict the threshold value of Ki-67 expression as 22.5%. Patients with ≥22.5% of the tumour cells positive for Ki-67 are considered to have high Ki-67, and those with ≤ 22.5% as low Ki-67. In a 4-year follow-up, 63 patients metastasised; 21 had a low Ki-67 index and 42 had a high Ki-67 index. As a result, the hazard ratio for high Ki-67, a ratio that analyses the frequency and timing of metastasis, had a rating of 2.969, about three times the risk for metastasis and relapse than low Ki-67. This increased risk suggests that high Ki-67 levels generally contribute to worse prognosis, early recurrence and metastasis (Ma et al., 2024). 

OVEREXPRESSED HER2-POSITIVE

Patients with overexpressed HER2 receptors have constant activation of growth pathways. This receptor creates strong growth signals, such as the mitogen-activated protein kinase (MAPK) and extracellular signal-regulated kinase (ERK), Wnt/β-catenin and NF-κB pathways. All these growth signals drive overexpressed HER2, promoting uncontrollable growth and increased ability to metastasise (Cheng, 2024). This discovery is particularly relevant since interval cancer more often occurs as HER2-positive than screen-detected cancer (Nykänen et al., 2024). Overexpressed HER2 also drives several more aggressive characteristics: it induces EMT, making cells more mobile and allowing them to invade other tissues. HER2-positive tumours also upregulate matrix metalloproteinases (MMPs). These enzymes break down tissue barriers to enter blood vessels and promote angiogenesis by activating vascular endothelial growth factor (VEGF) to signal vessel growth, allowing cancer cells to enter the bloodstream to supply nutrients and oxygen to metastatic tumours. If tumour cells metastasise, HER2 overexpression activates anti-apoptotic signals to protect cancer cells from surrounding stroma and immune cells (Cheng, 2024). 

These characteristics confirm interval cancer’s unique histopathological features, characterised by more aggressive, faster growth and early metastasis. The biological difference between interval cancer and screen-detected cancer explains the aggressiveness and growth rate of breast cancer. Two of the highest mortality rates, even after the use of AI screening, are metastasis and interval cancer. Metastasis, the leading cause of death, and interval cancer’s fast, unpredictable and aggressive biology both outpace the current precautions taken to counter them. Interval cancer’s already aggressive nature and its ability to metastasise early result in patients having worse survival rates. Currently, AI still struggles with predicting long-term metastasis and interval cancer. Further studies on overcoming gaps in AI’s screening of metastasis and interval cancer will reduce the mortality rates of breast cancer. 

Interval breast cancer’s strong association with nodal and distant metastasis results from aggressive tumour profiles found in interval cancer (Nykänen et al., 2024). High Ki-67 expression, an indicator for cell division, increases the risk of metastasis threefold compared to low Ki-67 (Ma et al., 2024). Similarly, overexpressed HER2 provides excess growth signals through MAPK, ERK and Wnt/β-catenin, immune evasion through anti-apoptotic signalling, and invasive behaviours through EMT, MMPs and angiogenesis (Cheng, 2024). Together, these signalling pathways and molecular mechanisms explain interval cancer’s worse mortality rating, rapid progression and more substantial risk for metastasis.

2. Prediction and Detection

2.1 Challenges in Prediction and Detection

Predicting interval breast cancer and metastasis remains challenging for current prediction approaches. Compared to screen-detected cancers, interval cancers escape early detection, while metastasis is more likely to develop in interval cancers due to their high proliferation, aggressiveness and limited approaches to manage this biological process (Schirano et al., 2024; Nykänen et al., 2024). Current prediction methods attempt to address these challenges through various AI-assisted mammography, tomosynthesis and genetic predisposition approaches (Bahl, 2025; DePolo, 2024). However, even with these advances, interval cancer and metastasis remain the leading cause of death. Metastasis is responsible for >90% of total tumour-related deaths, and those with interval cancer were three times more likely to die compared to screen-detected cases. Despite advances in imaging and AI tools, gaps in current AI detection remain unreliable in predicting interval cancer and the risks of metastasis (Riggio, Varley & Welm, 2021; DePolo, 2022). 

2.2 Current Interval Cancer Prediction Methods

2.2.1 Imaging Techniques

AI-ASSISTED DIGITAL BREAST TOMOSYNTHESIS WITH LUNIT INSIGHT DBT AI

Although interval breast cancer has poorer outcomes due to its aggressive biology and rapid growth, AI-assisted digital breast tomosynthesis (DBT), a 3D mammography, visualises breast lesions and cancers obscured by dense tissue. Dr Manisha Bahl, a physician investigator in radiology, analysed 224 interval breast cancer cases and used the Lunit Insight DBT AI algorithm. Lunit Insight DBT AI is an algorithm that identifies and classifies suspected areas for breast cancer using tomosynthesis images. The program analyses Digital Imaging and Communications in Medicine (DICOM) images, which contain pictures and structured information about the patient, to provide a quantitative estimation of the presence of malignant lesions (Blackford, 2024). The data input into this AI includes 30,000 positive and 30,000 negative 2D mammogram breast cancer cases, fine-tuned with 12,810 DBT examinations, biopsy cancers and radiologist-marked images.

In Bahl’s study, the program detected 32.6% (73/224) of previously undetected interval cancer cases (Bahl, 2025), the average reading time dropped from 54 to 49 seconds in each case, and accuracy increased from 0.90 to 0.92 using the Area Under the Curve (AUC) statistic. However, since only a third of interval cancer cases were detected, this limits the use for medical institutions and commercial uses (Bahl, 2025). Additionally, the AI lacks real clinical data and demonstrates weaker performance for the detection of aggressive cancer cases (Park et al., 2024). 

UCLA TRANSPARA AI

UCLA’s Transpara AI is an AI interpretation software for 2D mammograms and 3D DBT, providing a risk score for each case on a scale of 1-10. Similar to the Lunit Insight DBT AI, DICOM images are input, and the AI analyses suspicious characteristics, such as masses, architectural distortion, classification and fine details. In each case analysed, the system displays heatmaps for areas likely to be cancer, alongside a risk score (ScreenPoint Medical, n.d.). AI flagged 76% of mammograms initially read as normal but linked to interval cancer, as well as 90% of reading error cases, 89% of subtle yet actionable cancer, 69% of occult cancer and 50% of true interval cancer (Heady, 2025). With this AI program, the algorithm predicted interval cancer; however, certain types of interval cancer are more difficult to detect, such as occult or true interval cancer. Considering other aggressive types of interval cancers, AI-supported screenings still encounter difficulties in prediction methods (Heady, 2025). 

2.2.2 Genetic and Molecular Approaches

ATM, BRCA1, BRCA2, CHEK2 AND PALB2

Interval cancer is considered to have genetic or molecular drivers. In a study analysing 9,752 Swedish women aged 40 to 76, 1,229 women diagnosed with interval breast cancer were 50% more likely to have a mutation in ATM, BRCA1, BRCA2, CHEK2 and PALB2 (DePolo, 2024). In addition, those with a family history of any mutations of the five distinguishing breast cancer genes are four times more likely to develop interval cancer. The study concludes that screen-detected cancers and interval cancers possess distinct genetic profiles and may provide insight into the identification of those at a high risk of developing aggressive breast cancer (DePolo, 2024). 

Currently, clinics offer personalised screenings for identifying interval cancer based on the five main breast cancer genes. These genetic variants are likely to explain why certain women develop interval breast cancer. Research is currently addressing the complexities of molecular biology and genetics contributing to the risk of interval cancer. The goal is to increasingly and accurately identify women at high risk based on genetic predisposition (Shieh, Ziv & Kerlikowske, 2020).

2.3 Current Metastasis Prediction Methods

2.3.1 Metastasis Detection Methods

It is imperative to understand the current state of breast cancer screening regarding detection, early detection and prediction before newer approaches or solutions can be proposed. Metastatic breast cancer is known to be difficult to detect and predict due to a lack of unique symptomology and limitations of current imaging techniques. Mammograms may mark tumours as stable, therefore hindering further scrutiny of the abnormal cells for potential metastasis (American Medical Association, 2024). This difficulty also contributes to the ongoing ineffective ability to recognise the likelihood of interval cancer in patients. Current simple detection methods for metastasis are limited to clinical observations of the spread of cancer to organs distant from the original tumour site. Patients with stage 2 or 3 breast cancer may undergo a variety of blood tests, imaging techniques and biopsies, called a “staging workup”, to help find or rule out evidence of metastasis (National Breast Cancer Foundation, 2019). This workup is not limited to before treatment, as the rate of distant metastasis recurrence is 10-30% for early-stage disease, with TNBC and inflammatory breast cancer having the highest recurrence rates due to their aggressive biology (Rubio, 2025). The American Society of Clinical Oncology (ASCO) created guidelines for postoperative follow-ups and management of breast cancer, including identifying symptoms, mammography, checking for serum tumour markers, consultations with physical exams and antigen evaluations, imaging, and CT and MRI scans if providers find suspicious data. When specifically analysing the role of mammography in metastasis, it has proven to be an effective method of early diagnosis before cancer can metastasise, effectively lowering mortality rates associated with metastatic breast cancer. 

Yet, ASCO guidelines and mammography are incapable of successfully detecting early metastasis or predicting cancer progression (Scully et al., 2012). 85% of metastatic breast cancer patients had an early-stage breast cancer diagnosis, and only 15% of patients had evidence of metastatic spread at the initial breast cancer diagnosis (Canzoniero, 2022). It can be concluded that the remaining 70% of patients may not have been able to foresee the metastasis of their cancer at the time of initial diagnosis, consequently delaying potential early intervention steps or increased cancer oversight. This inability to note early metastasis indicates a critical deficit in current detection and prediction approaches.

2.3.2 Metastasis Early Detection and Prediction

MICROMETASTASIS

New novel approaches for early detection and prediction of metastasis are the primary focus of ongoing research. A prime example of an early detection method for an aggressive metastasis-indicating factor is studying micrometastasis. Micrometastasis is defined as lymph node metastasis measuring greater than 0.2mm but no greater than 2.0mm in size, in which primary tumour cancer cells break off and disseminate (Atkins & Kong, 2013). Micrometastases are an important prognostic factor in breast cancer and can be indicative of cancer recurrence or distant metastasis (Luo et al., 2023). In fact, there is a presence of micrometastasis associated with poor prognosis in the bone marrow of breast cancer patients as early as initial diagnosis (Braun et al., 2005). A case-control study found that micrometastases were 10 times more frequent in patients who developed distant metastases within 15 years after axillary lymph node dissection, despite negative nodes in microscope examination at the time of surgery. The study concluded a significant association between the presence and the development of distant metastasis in patients with T1 (tumour ≤ 2cm) breast cancer, as well as evidence of previously ineffective detection of cancerous nodes (Susnik, Frkovic-Grazio & Bracko, 2004). Such research is critical for understanding that if early micrometastasis exists and can emit warning signs for aggressive cancer progression, then detection in early-stage patients would have a significant clinical influence for optimal long-term treatment (Mao et al., 2022). 

CIRCULATING TUMOUR CELLS

Increasing evidence supports that metastasis has early implications in patients with aggressive cancer before primary lesions are detected (Lawrence et al., 2023). An emerging method to analyse these implications to more accurately predict early metastasis is examining circulating tumour cells. The metastatic cascade begins with tumour cell shedding from the primary tumour and entering the circulatory and lymphatic systems. This cell may develop into disseminated tumour cells (DTCs) or circulating tumour cells (CTCs) (Trapp et al., 2022). However, DTCs may become dormant, resulting in the formation and detection of a metastatic lesion years after initial diagnosis and treatment. CTCs, on the other hand, can be detected early in peripheral blood and may be the first indicators of metastasis. Therefore, they have the potential to serve as a tool for aggressive cancer prediction and early detection (Lawrence et al., 2023). In fact, viable CTCs are released into the blood even during early carcinogenesis stages (Crook et al., 2022). Researchers analyse CTCs from extracted patient blood and further isolate, enrich and genetically characterise the cells. Accumulating evidence suggests that CTCs numbers could be indicative of ongoing metastasis and are correlated with poor prognosis, proving to be a critical biomarker of the metastatic process (Scully et al., 2012; Thery et al., 2019). Although it is important to note that only specific aggressive subpopulations of CTCs survive and directly develop into metastatic tumours, clusters of CTCs are more likely to form distant metastasis than individual cells (Rodrigues & Vanharanta, 2019). A joint clinical trial in 250 German study centres showed that CTC presence in blood samples before and after chemotherapy of breast cancer patients who subsequently developed distant recurrences had an increased risk for bone metastasis and multi-metastatic disease (Trapp et al., 2022). By using CTCs as a prognostic biomarker, early treatment plans for breast cancer at a stage where the cancer is still relatively responsive to the treatment may be possible.

2.4 Gaps in Metastasis Prediction

Several different approaches toward early detection and prediction of metastasis in breast cancer patients have been investigated, including micrometastasis and CTCs. Yet, each method runs into limitations, specifically when considering clinical plausibility and precision. Due to the minute size of micrometastases, the tumours cannot be detected clinically or by conventional medical imaging machines. Instead, invasive techniques are required, such as lymph node biopsy and subsequent microscopy, which is not always practical. Since current detection methods are restricted, this creates opportunities for new technological approaches, such as AI, to provide further insight into micrometastasis and its implications for early-stage breast cancer patients. AI models can be trained on large sets of diverse patient imaging data, which provides the basis of knowledge for greater inspection of any given patient’s images (Luo et al., 2023). For example, a team based in China developed an AI-assisted analysis model of cancer imaging data for features that were indicative of micrometastasis in gastric cancer patients but not detectable by the naked human eye (Dong et al., 2019). However, such AI models are limited to defining and detecting micrometastasis from only a few specific cancers, therefore restricting the scope of the AI. Additionally, larger and diverse sample sizes are required to further enhance the accuracy of the technology (Mao et al., 2022).

Shifting toward CTCs, these are rare cells, difficult to harvest and isolate from the vast number of healthy blood cells. Additionally, CTCs are heterogeneous, with groups of CTCs showing significant variations in surface biomarker expression and are difficult to capture with conventional laboratory methods without causing cell damage (Ju et al., 2022). Once again, these constraints have opened a window for research into the uses of increasingly accurate technology, such as utilising machine learning algorithms based on single-cell sequencing data to detect CTCs (Pastuszak et al., 2024). Other AI prediction outlets are continuously being investigated, such as a successful, high-accuracy AI model developed by a team based in China that combined clinical blood markers and ultrasound data from breast cancer patients to predict and identify distant metastasis of breast cancer. A common tumour marker called Cancer Antigen 15-3 (CA153), a type of isoenzyme found in cardiac and skeletal muscles called Creatine Kinase-Myocardial Band (CK-MB), lipoprotein (a) markers and maximum lesion diameter from ultrasound positively correlated with distant metastasis, while indirect bilirubin and magnesium ion markers showed negative correlations when analysed by AI and validated (Tan et al., 2024). AI holds much promise for filling in the gaps that current metastasis early detection and prediction methods exhibit, such as inefficient standalone technology or insufficient human insight. Yet, current abilities and models of AI are far from ideal and the development of increasingly precise and flexible models is needed to take advantage of the growing number of breast cancer early detection and prediction methods that have potential for clinical utility.

2.5 Current Problems of AI and Application in Interval Breast Cancer

Interval breast cancer is especially a major challenge to detect during the early stages or initial screenings, thus making the prognosis and diagnostic process more difficult. To develop a more effective prognosis method for this problem, it is crucial to understand why the existing screening mammograms used by radiologists and other professionals in the field are unable to identify interval cancers. A negative index screening can occur when AI shows no signs of malignancy, and also because of the main categories of interval mentioned in section 1.2: true interval, occult interval, false negatives and minimal sign (Lee et al., 2023).

Of the several existing areas of improvement within AI usage in mammography screening tests, the most prominent and crucial of those mentioned above would be false negatives, in which the AI model was not able to identify carcinogenic material due to training bias or image complexity (i.e., dense breast tissue, atypical presentation). To address this issue, the development of an AI model whose sole purpose is to rank these mammographies based on the rarity of certain features within a large dataset offers a potential solution.

3. Artificial Intelligence in Breast Cancer

3.1 Basics of Large Language Models and Deep Learning

3.1.1 LARGE LANGUAGE MODELS

Large language models (LLMs) are AI models trained using immense libraries of data to comprehend and respond with the coherence of a human (IBM, 2023). The most notable example is GPT-3, an LLM created using vast amounts of data to respond plausibly and relevantly to any prompt given (IBM, 2023). 

3.1.2 DEEP LEARNING

Machine learning involves algorithms that adapt and identify patterns within sets of data without being explicitly programmed to do so. A subset of this is deep learning, where the use of complex artificial neural networks allows models to process information and provide output through a network of neural nodes (Holdsworth & Scapicchio, 2024). Forward propagation applies biases and weights to build upon each other to refine the prediction. In contrast, back propagation involves processing data backwards through layers to predict further and eliminate errors. Together, both methods allow results to be both more accurate and precise. Some examples relevant to medical research include: generative adversarial networks (GANs), where a generator creates fake scenarios, such as mammograms, and a discriminator identifies if the case is real or fake based on its own mathematical patterns, with both feeding back to one another to improve; and convolutional neural networks (CNNs), which utilise threshold values at each neural layer – if data exceeds the threshold for that layer, the data is passed on to the next, allowing each layer to only focus on distinct patterns and creating room for increased complexity (Holdsworth & Scapicchio, 2024). Overall, the use of AI learning in the medical field as a powerful tool can assist humans in diagnosis, imaging and treatment. 

3.2 AI Mammography Screening Frequency Tier Classification Model

3.2.1 Overview

The sole purpose of this model is to analyse large sets of mammography screenings and classify them into five different tiers (A-E Tier) based on the frequency of a particular aspect of the screening that appears to recur based on a generalised frequency developed by the model after rigorous testing to avoid any bias. 

To further reduce bias, the model incorporates eleven additional feature vectors, which further train the models. The following being: mammogram screening images; current age; ethnicity and race; history of cancer; any known familial history; BI-RADS breast density classification; time since last screening; screening modality history device used for last screening (MRI, etc.); hormonal status; known genetic markers (i.e., BRCA 1 and BRCA 2); and additional notes either found on the electronic health records (EHR) system or written by a radiologist.

To avoid the extensive usage of an NLP model, all feature vectors mentioned above will be dropdown or checkbox-based, except for the initial mammography screening image and “Additional Notes” feature vector. For this particular feature vector, an existing pre-trained NLP, such as the Bidirectional Encoder Representations from Transformers (BERT) model, will be integrated into the system to process any notes by the medical professional or additional EHR data mentioned.

3.2.2 Obtaining a Dataset

Currently, no publicly available dataset contains all twelve feature vectors in a unified format. However, foundational imaging datasets, such as Mammo-Bench, a benchmark dataset for mammographic research, can serve as a starting point. These datasets may be augmented by linking to EHR or accessing institutional data banks to retrieve complementary clinical information.

Local hospital registries and regional screening programmes may also provide valuable data, especially for enriching metadata associated with mammography images. While public medical records offer substantial breadth, they often lack granularity and completeness. In cases where more comprehensive data is required, collaborations with healthcare institutions may be pursued (Faridoon, 2024). These partnerships must navigate strict data governance protocols, including, but not limited to, Institutional Review Board (IRB) approvals, data use agreements, secure transfer protocols, firewall protections and encryption standards (Faridoon, 2024). 

Once the testing and training phase of the dataset is complete and meets expectations, it can be used in real-world applications by integrating real patient data using a user interface system to make it easier for operators to use. After all the necessary data is input into the model, the screening’s overall frequency tier will be shown. 

3.2.3 Model Architecture and Training Workflow

Individual Layered Components

Description and Usage

Input Layer (1) Reads three types of user input data: image screenings, structured vectors and unstructured vectors.
Image Encoder (2) Pretrained CNN extracts and identifies important and unusual features from the mammograms.
Structured Data Encoder (3) Structured features are normalised and passed through a shallow feedforward network.
Unstructured Data (Text) Encoder (4) BERT-based (or any other NLP model) processes the radiologists’ notes and extra EHR data.
Feature Fusion Layer (5) Connects data from all three modalities into one unified representation.
Bias Audit Layer (6) Checks for bias and attempts to reweigh underrepresented combinations.
Tier Classification Head (7) Outputs a tier (A-E) based on the overall logic throughout all the layers.

 

(1) The model receives three inputs: imaging, such as mammography image screenings; structured vectors, including age, breast density and classification; and lastly, unstructured vectors received via the “Additional Notes” feature vector that contains radiologists’ observations and/or EHR information.

(2) The Image Encoder focuses solely on extracting and identifying crucial patterns in the mammography. This encoder may include identifying calcium deposits, atypical structures and both natural and unnatural aspects of the image screenings. A pre-trained CNN model, such as ResNet-50, can be utilised for pattern recognition in images (Kundu, 2023). The ResNet-50 Model is a deep learning model originally designed by Microsoft in 2015 to map images accurately. The model utilises the technique of residual connections, a shortcut which helps the model process complex data by skipping over certain layers to avoid confusion or computational errors that add up over time (Wong, 2021). Otherwise, an open-sourced CNN model can be trained extensively on mammography image screenings using deep learning techniques.

(3) Tabular (structured) features are expressed in two different ways: numerical features or categorical features. These features are then normalised or adjusted to have similar scales. For example, the time since the last screening may use z-score standardisation, a technique that helps transform any data to a standard scale, avoiding the prevalence of certain features (GeeksforGeeks, 2024b). For example, larger periods of patient screening time and monitoring do not override shorter periods of time, thus avoiding the creation of additional bias. Categorical features methods, like one-hot encoding, can be chosen by the operator based on categorical data complexity (GeeksforGeeks, 2019). The one-hot encoding method functions by turning each category (row) into binary columns, making it easier for the model to interpret non-numeric data (GeeksforGeeks, 2019). Additionally, the model can be trained with a shallow feedforward neural network, which contains hidden layers to enhance the model’s capabilities in pattern recognition and other areas.

(4) The additional notes or raw unstructured data are handled by an NLP model, such as the BERT model, which processes any notes by the medical professional or additional EHR data mentioned in the “Additional Notes” feature vector.

(5) One of the key steps in this process involves the model gathering the various embeddings, such as image embedding, structured embedding and unstructured embedding or a list of numbers which represent the unique input generated by the model throughout the previous steps (GeeksforGeeks, 2024a). Then it joins these embeddings into a single line structural representation in a process called concatenation.

(6) Designed to detect, quantify and mitigate any kind of bias in the model’s current predictions. This layer of the model can be further developed to track false positives and negatives, as well as overall inaccuracies in sensitivity and specificity. Additionally, it can identify combinations that are underrepresented in the dataset and the overall inaccuracies in tier classification by the model.

(7) The model outputs a tier classification ranging from A to E, where Tier A represents the most commonly observed and well-characterised cases, while Tier E flags rare or atypical profiles that may warrant additional clinical review or follow-up screening. Tier assignment is governed by a composite rarity score that integrates multiple factors: feature frequency analysis, which evaluates how often individual attributes appear across the dataset; joint rarity modelling, which assesses the prevalence of specific feature combinations; and modality-weighted scoring, which adjusts the influence of each data source based on input quality. For instance, if a mammogram image is blurry or incomplete, the model increases reliance on structured and unstructured feature vectors to maintain consistency and avoid bias. This approach ensures that the model accounts for both statistical rarity and data reliability when determining the appropriate tier, supporting the model to mitigate bias and further assist patient care.

3.2.4 Tier Assignment Logic

The tier assignment logic is the final decision-making stage of the model, responsible for categorising patients into one of five tiers (A-E) based on a comprehensive analysis of imaging, structured and unstructured data. This logic is designed to reflect both statistical rarity and clinical urgency, while incorporating safeguards to ensure fairness and interpretability.

1. Tier A: Common, well-understood cases with high confidence. 

2. Tier B-D: Increasing levels of complexity, uncertainty or atypical combinations.

3. Tier E: Rare or ambiguous cases requiring follow-up or expert review.

4. Conclusion

Breast cancer continues to pose a major global health challenge, with metastasis and interval cancers driving the majority of mortality due to their aggressive nature and difficulty in terms of early detection. Current screening approaches, while effective for many patients, often fail to identify subtle or abnormal presentations promptly. Understanding the background of metastasis and interval cancer is key to developing efficient and accurate ways of combatting these subcategories of breast cancer. Metastasis, which causes over 90% of cancer-related deaths, arises from genetic mutations and metabolic adaptations that enable uncontrolled growth, immune evasion and the dissemination of cancer cells. As well as interval cancers’ aggressive histopathological characteristics, such as high levels of Ki-67, HER2 overexpression and TNBC. 

Breast cancer risk is influenced by various factors such as ageing, reproductive history, hormonal exposure, lifestyle choices and genetic factors. Major contributors include older age, early menarche, nulliparity, hormone replacement therapy, alcohol and smoking, and BRCA1/BRCA2 mutations, all of which increase susceptibility to tumour development and metastasis. While AI-assisted imaging, genetic profiling and molecular methods, such as the presence of micrometastasis and CTC detection, show promise, current tools remain limited in accuracy and efficiency. By advancing the integration of AI with vast sets of genetic and molecular data, the accuracy of these techniques will improve greatly, leading to the early detection of metastatic and interval cancer. The basic technologies that are the foundations of artificial intelligence are large language models that are trained on vast datasets to generate human-like, contextually relevant responses and deep learning, a subset of machine learning, neural networks to refine predictions, GANs and CNNs, enabling advanced pattern recognition and image analysis. The AI-driven mammography screening frequency tier classification model outlined in this proposal offers a transformative approach to addressing these gaps by distinguishing those at a higher rarity by classifying cases from Tier A, a well-understood common case with high confidence, to Tier E, a rare or ambiguous case requiring follow up.

Future steps include collaborating with hospitals in order to gain a more complete and diverse dataset to incorporate all the feature vectors. A successful implementation will reshape the current screening and diagnostic process, reducing the quantity of missed cases and leading to a favourable prognosis for patients. 

Bibliography

American Cancer Society (2025) Breast Cancer HER2 Status, American Cancer Society [online]. –<https://www.cancer.org/cancer/types/breast-cancer/understanding-a-breast-cancer-diagnosis/breast-cancer-her2-status.html>

American Cancer Society (2025) Key Statistics for Breast Cancer, American Cancer Society [online]. <https://www.cancer.or/cancer/types/breast-cancer/about/how-common-is-breast-cancer.html>

American Medical Association (2023) What Every Doctor Should Know About Metastatic Breast Cancer, Including Why It’s Missed, AMA Ed Hub [online]. <https://edhub.ama-assn.org/pages/what-every-doctor-should-know-about-metastatic-breast-cancer>

Arnold, M., Morgan, E., Rumgay, H., Mafra, A., Singh, D., Laversanne, M., Vignat, J., Gralow, J.R., Cardoso, F., Siesling, S. & Soerjomataram, I. (2022) Current and future burden of breast cancer: Global statistics for 2020 and 2040, The Breast, 66(66), pp. 15–23. 

Atkins, K.A. & Kong, C. (2012) Practical Breast Pathology: A Diagnostic Approach E-Book. Elsevier Health Sciences.

Bahl, M. (2025) AI Catches One-Third of Interval Breast Cancers Missed at Screening, Radiological Society of North America [online]. <https://www.rsna.org/news/2025/july/ai-catches-one-third-of-interval-breast-cancers?>

Banys, M., Gruber, I., Krawczyk, N., Becker, S., Kurth, R., Wallwiener, D., Jakubowska, J., Hoffmann, J., Rothmund, R., Staebler, A. & Fehm, T. (2012) Hematogenous and lymphatic tumor cell dissemination may be detected in patients diagnosed with ductal carcinoma in situ of the breast, Breast Cancer Research and Treatment, 131(3), pp. 801–808.

Braun, S., Vogl, F.D., Naume, B., Janni, W., Osborne, M.P., Coombes, R.C., Schlimok, G., Diel, I.J., Gerber, B., Gebauer, G., Pierga, J.-Y., Marth, C., Oruzio, D., Wiedswang, G., Solomayer, E.-F., Kundt, G., Strobl, B., Fehm, T., Wong, G.Y.C. & Bliss, J. (2005) A Pooled Analysis of Bone Marrow Micrometastasis in Breast Cancer, New England Journal of Medicine, 353(8), pp. 793–802.

Breast Cancer Now (2024) Alcohol and Breast Cancer Risk, Breast Cancer Now [online]. <https://breastcancernow.org/about-breast-cancer/awareness/breast-cancer-risk-factors-and-causes/alcohol-and-breast-cancer-risk>

Cheng, X. (2024) A Comprehensive Review of HER2 in Cancer Biology and Therapeutics, Genes, 15(7), pp. 903–903.

Chen, W.Y. (2008) Exogenous and endogenous hormones and breast cancer, Best practice & research. Clinical endocrinology & metabolism, 22(4), pp. 573–585. 

ScreenPoint Medical (n.d.) The Transpara Breast AI Suite [online]. <https://screenpoint-medical.com/choosing-transpara>

Cleveland Clinic (2025) Triple-Negative Breast Cancer, Cleveland Clinic [online]. <https://my.clevelandclinic.org/health/diseases/21756-triple-negative-breast-cancer-tnbc>

Cooper, G.M. (2000) The Cell: A Molecular Approach. Sinauer Associates. 

Crook, T., Leonard, R., Mokbel, K., Thompson, A., Michell, M., Page, R., Vaid, A., Mehrotra, R., Ranade, A., Limaye, S., Patil, D., Akolkar, D., Datta, V., Fulmali, P., Apurwa, S., Schuster, S., Srinivasan, A. & Datar, R. (2022) Accurate Screening for Early-Stage Breast Cancer by Detection and Profiling of Circulating Tumor Cells, Cancers, 14(14), 3341.

Crosier, M., Scott, D., Wilson, R.S., Griffiths, C., May & Westley, B.R. (2001) High Expression of the Trefoil Protein TFF1 in Interval Breast Cancers, National Library of Medicine, 159(1), pp. 215–221.

DePolo, J. (2022) Interval Breast Cancers Found Between Regular Screening Mammograms Seem More Aggressive, breastcancer.org [online]. <https://www.breastcancer.org/research-news/interval-breast-cancers-seem-more-aggressive>

DePolo, J. (2024) Genetic Mutations More Likely in Breast Cancer Found Between Mammograms, breastcancer.org [online]. <https://www.breastcancer.org/research-news/interval-cancers-more-genetic-mutations>

Dewi, C., Fristiohady, A., Amalia, R., Khairul Ikram, N.K., Ibrahim, S. & Muchtaridi, M. (2022) Signalling Pathways and Natural Compounds in Triple-Negative Breast Cancer Cell Line, Molecules, 27(12), 3661. 

Domingo, L., Salas, D., Zubizarreta, R., Baré, M., Sarriugarte, G., Barata, T., Ibáñez, J., Blanch, J., Puig-Vives, M., Fernández, A.B., Castells, X. & Sala, M. (2014) Tumor phenotype and breast density in distinct categories of interval cancer: results of population-based mammography screening in Spain, Breast Cancer Research, 16(1). 

Dong, D., Tang, L.G., Li, Z.B., Fang, M., Gao, J.R., Shan, X., Ying, X.N., Sun, Y., Fu, J., Wang, X.F., Li, L., Li, Z.B., Zhang, D., Zhang, Y., Li, Z., Shan, F., Bu, Z., Tian, J. & Ji, J. (2019) Development and validation of an individualized nomogram to identify occult peritoneal metastasis in patients with advanced gastric cancer, Annals of Oncology, 30(3), pp. 431–438.

Key, T. J., Appleby, P. N., Reeves, G. K., Travis, R. C., Alberg, A. J., Barricarte, A., Berrino, F., Krogh, V., Sieri, S., Brinton, L. A., Dorgan, J. F., Dossus, L., Dowsett, M., Eliassen, A. H., Fortner, R. T., Hankinson, S. E., Helzlsouer, K. J., Hoffman-Bolton, J., Comstock, G. W., … Vineis, P. (2013) Sex hormones and risk of breast cancer in premenopausal women: a collaborative reanalysis of individual participant data from seven prospective studies, ​​The Lancet Oncology, 14(10), pp. 1009–1019. 

Ewertz, M., Duffy, S. W., Adami, H. O., Kvåle, G., Lund, E., Meirik, O., Mellemgaard, A., Soini, I. & Tulinius, H. (1990) Age at first birth, parity and risk of breast cancer: a meta-analysis of 8 studies from the Nordic countries, International Journal of Cancer, 46(4), pp. 597–603. 

Faridoon, A. (2024) Healthcare Data Governance, Privacy, and Security – A Conceptual Framework, arXiv, 2403.17648. <https://arxiv.org/html/2403.17648v1>

GeeksforGeeks (2019) One Hot Encoding in Machine Learning, GeeksforGeeks [online]. <https://www.geeksforgeeks.org/machine-learning/ml-one-hot-encoding/>

GeeksforGeeks (2024a) What are embeddings in machine learning?, GeeksforGeeks [online]. <https://www.geeksforgeeks.org/machine-learning/what-are-embeddings-in-machine-learning-2/>

GeeksforGeeks (2024b) ZScore Normalization: Definition and Examples, GeeksforGeeks [online]. <https://www.geeksforgeeks.org/data-analysis/z-score-normalization-definition-and-examples/>

Gerstberger, S., Jiang, Q. & Ganesh, K. (2023) Metastasis, Cell, 186(8), pp. 1564–1579. 

Gilliland, F.D. (2000) Biologic Characteristics of Interval and Screen-Detected Breast Cancers, Journal of the National Cancer Institute, 92(9), pp. 743–749. 

Goldberg, M., D’Aloisio, A.A., O’Brien, K.M., Zhao, S. & Sandler, D.P. (2020) Pubertal timing and breast cancer risk in the Sister Study cohort, Breast Cancer Research, 22(1). 

Hamer, J. & Warner, E. (2017) Lifestyle modifications for patients with breast cancer to improve prognosis and optimize overall health, Canadian Medical Association Journal, 189(7), pp. 268–274. 

Hanahan, D. & Weinberg, Robert A. (2011) Hallmarks of cancer: the next Generation, Cell, 144(5), pp. 646–674.

Heady, D. (2025) AI could help improve early detection of interval breast cancers, UCLA Health [online]. <https://www.uclahealth.org/news/release/ai-could-help-improve-early-detection-interval-breast

Holdsworth, J. & Scapicchio, M. (2024) What is deep learning?, IBM [online]. <https://www.ibm.com/think/topics/deep-learning>

Houssami, N. & Hunter, K. (2017) The Epidemiology, Radiology and Biological Characteristics of Interval Breast Cancers in Population Mammography Screening, Nature Portfolio, 3(1).

IBM (2023) What are large language models (LLMs)?, IBM [online]. <https://www.ibm.com/think/topics/large-language-models>

Johns Hopkins Medicine (2022) Overview of the Breast – Breast Pathology, John Hokins Medicine Pathology [online]. <https://pathology.jhu.edu/breast/overview>

Ju, S., Chen, C., Zhang, J., Xu, L., Zhang, X., Li, Z., Chen, Y., Zhou, J., Ji, F. & Wang, L. (2022) Detection of circulating tumor cells: opportunities and challenges, Biomarker Research, 10(1). 

Kadota, M., Sato, M., Duncan, B., Ooshima, A., Yang, H.H., Diaz-Meyer, N., Gere, S., Kageyama, S.-I. ., Fukuoka, J., Nagata, T., Tsukada, K., Dunn, B.K., Wakefield, L.M. & Lee, M.P. (2009) Identification of Novel Gene Amplifications in Breast Cancer and Coexistence of Gene Amplification with an Activating Mutation of PIK3CA, Cancer Research, 69(18), pp. 7357–7365.

Kass, E. (2025) Hormone Therapy for Breast CancerBreast Cancer Research Foundation [online]. <https://www.bcrf.org/about-breast-cancer/hormone-therapy-breast-cancer/>

Katsura, C., Ogunmwonyi, I., Kankam, H.K. & Saha, S. (2022) Breast Cancer: Presentation, Investigation and Management, British Journal of Hospital Medicine, 83(2), pp. 1–7.

Kim, S.K. & Cho, S.W. (2022) The Evasion Mechanisms of Cancer Immunity and Drug Intervention in the Tumor Microenvironment, Frontiers in Pharmacology, 13(868695).

Kiri, S. & Ryba, T. (2024) Cancer, metastasis, and the epigenome, Molecular Cancer, 23(154).

Kundu, N. (2023) Exploring ResNet50: An In-Depth Look at the Model Architecture and Code Implementation, Medium [online]. <https://medium.com/@nitishkundu1993/exploring-resnet50-an-in-depth-look-at-the-model-architecture-and-code-implementation-d8d8fa67e46f

Lawrence, R., Watters, M., Davies, C.R., Pantel, K. & Lu, Y.-J. (2023) Circulating Tumour Cells for Early Detection of Clinically Relevant Cancer, Nature Reviews Clinical Oncology, 20(7), pp. 487–500.

Lee, J.M., Jantarang, P. & Rengabashyam, B. (2023) Lessons Learnt from Imaging Review of Interval Breast Cancers in a Single Centre in the UK National Breast Screening Program, Indian Journal of Radiology & Imaging, 34(03), pp. 522-532. 

Liao, M., Yao, D., Wu, L., Luo, C., Wang, Z., Zhang, J. & Liu, B. (2024) Targeting the Warburg effect: A revisited perspective from molecular mechanisms to traditional and innovative therapeutic strategies in cancer, Acta Pharmaceutica Sinica B, 14(3), pp. 953–1008.

Blackford Analysis (2024) Lunit INSIGHT DBT, Blackford Analysis [online]. <https://blackfordanalysis.com/ai-portfolio-lunit-insight-dbt>

Luo, S., Fu, W., Lin, J., Zhang, J. & Song, C. (2023) Prognosis and local treatment strategies of breast cancer patients with different numbers of micrometastatic lymph nodes, World Journal of Surgical Oncology, 21(1). 

Ma, Q., Liu, Y.-B., She, T. & Liu, X.-L. (2024) The Role of Ki-67 in HR+/HER2- Breast Cancer: A Real-World Study of 956 Patients, Breast Cancer: Targets and Therapies, 16, pp. 117–126.

Mao, X., Mei, R., Yu, S., Shou, L., Zhang, W., Li, K., Qiu, Z., Xie, T. & Sui, X. (2022) Emerging Technologies for the Detection of Cancer Micrometastasis, Technology in Cancer Research & Treatment, 21, pp. 153303382211003-153303382211003.

McDonald, J. A., Goyal, A. & Terry, M. B. (2013) Alcohol Intake and Breast Cancer Risk: Weighing the Overall Evidence, Current Breast Cancer Reports, 5(3).

Medline Plus (2020) TP53 gene, Medline Plus [online]. <https://medlineplus.gov/genetics/gene/tp53/

Messinger, J., Crawford, S., Roland, L. & Mizuguchi, S. (2018) Review of Subtypes of Interval Breast Cancers With Discussion of Radiographic Findings, Current Problems in Diagnostic Radiology, 48(6), pp. 592-598.

Murphy, M. (2021) What are foundation models?, IBM [online]. <https://research.ibm.com/blog/what-are-foundation-models>

Musolino, A., Falcini, F., Sikokis, A., Boggiani, D., Rimanti, A., Pellegrino, B., Silini, E.M., Campanini, N., Barbieri, E., Zamagni, C., Degli Esposti, R., Cortesi, L., Bisagni, G., Cavanna, L., Frassoldati, A., Sgargi, P. & Michiara, M. (2018) Prognostic impact of interval breast cancer detection in women with pT1a N0M0 breast cancer with HER2-positive status: Results from a multicentre population-based cancer registry study, European Journal of Cancer, 88, pp. 10–20.

National Breast Cancer Foundation (2025) Metastatic Breast Cancer, National Breast Cancer Foundation [online]. <https://www.nationalbreastcancer.org/metastatic-breast-cancer/>

National Cancer Institute (2024) BRCA mutations: Cancer Risk and Genetic Testing, National Cancer Institute [online]. <https://www.cancer.gov/about-cancer/causes-prevention/genetics/brca-fact-sheet>

National Cancer Institute (2015) Risk Factors: Hormones, National Cancer Institute [online]. <https://www.cancer.gov/about-cancer/causes-prevention/risk/hormones>

National Cancer Institute (n.d.) Estrogen receptor negative, National Cancer Institute: Dictionary of Cancer Terms [online]. <https://www.cancer.gov/publications/dictionaries/cancer-terms/def/estrogen-receptor-negative>

National Cancer Institute (n.d.) Ki-67 proliferation index, National Cancer Institute: Dictionary of Cancer Terms[online]. <https://www.cancer.gov/publications/dictionaries/cancer-terms/def/ki-67-proliferation-index>

National Human Genome Research Institute (2025) Apoptosis, National Human Genome Research Institute [online]. <https://www.genome.gov/genetics-glossary/apoptosis>

Nguyen, T.L., Li, S., Dite, G.S., Aung, Y.K., Evans, C.F., Trinh, H.N., Baglietto, L., Stone, J., Song, Y., Sung, J., English, D.R., Jenkins, M.A., Dugué, P., Milne, R.L., Southey, M.C., Giles, G.G., Pike, M.C. & Hopper, J.L. (2019) Interval breast cancer risk associations with breast density, family history and breast tissue aging, International Journal of Cancer, 147(2), pp. 375–382. 

Nykänen, A., Sudah, M., Masarwah, A., Vanninen, R. & Okuma, H. (2024) Radiological Features of screening-detected and Interval Breast Cancers and Subsequent Survival in Eastern Finnish Women, Scientific Reports, 14(1). 

Park, E.K., Kwak, S., Lee, W., Choi, J.S., Kooi, T. & Kim, E.-K. (2024) Impact of AI for Digital Breast Tomosynthesis on Breast Cancer Detection and Interpretation Time, Radiology: Artificial Intelligence, 6(3)

Pastuszak, K., Sieczczyński, M., Dzięgielewska, M., Wolniak, R., Drewnowska, A. Korpal, M., Zembrzuska, L., Supernat, A. & Żaczek, A.J. (2024) Detection of circulating tumor cells by means of machine learning using Smart-Seq2 sequencing, Scientific Reports, 14(1).

Poorolajal, J., Heidarimoghis, F., Karami, M., Cheraghi, Z., Gohari-Ensaf, F., Shahbazi, F.,   Zareie, B., Ameri, P., & Sahraee, F. (2021) Factors for the Primary Prevention of Breast Cancer: A Meta-Analysis of Prospective Cohort Studies, Journal of Research in Health Sciences, 21(3), 00520.

Pulumati, A., Pulumati, A., Dwarakanath, B. S., Verma, A., & Papineni, R. V. L. (2023) ‘Technological advancements in cancer diagnostics: Improvements and limitations, Cancer Reports, 6(2), 1764. 

Riggio, A.I., Varley, K.E. & Welm, A.L. (2021) The lingering mysteries of metastatic recurrence in breast cancer, British Journal of Cancer, 124(1), pp. 13–26. 

Rodrigues, P. & Vanharanta, S. (2019) Circulating Tumour Cells: Come Together, Right Now, Over Metastasis, Cancer Discovery, 9(1), pp. 22–24.

Rubio, M. (2025) What to Know About Breast Cancer Recurrence, Breast Cancer Research Foundation [online]. <https://www.bcrf.org/about-breast-cancer/breast-cancer-recurrence/>

Russo, J., Moral, R., Balogh, G. A., Mailo, D. & Russo, I. H. (2005) The protective role of pregnancy in breast cancer, Breast Cancer Research, 7(3), pp. 131–142. 

Saha Roy, S. & Vadlamudi, R.K. (2012) Role of estrogen receptor signaling in breast cancer metastasis, International Journal of Breast Cancer, 654698. 

Schirano, A.M., Dell’Aquila, L., Melucci, G. & Galeotti, R. (2024) Comparison of radiological and histopathological features between interval and screen-detected breast cancers: a retrospective case–control study, Journal of Medical Imaging and Interventional Radiology, 11(1). 

Schnipper, H.H. (2020) Interval Breast Cancers Are More Dangerous, Beth Israel Lahey Health [online]. <https://www.bidmc.org/about-bidmc/blogs/living-with-cancer/2020/10/interval-breast-cancers-are-more-dangerous>

Scully, O.J., Bay, B.-H., Yip, G. & Yu, Y. (2012) Breast Cancer Metastasis, Cancer Genomics & Proteomics, 9(5), pp. 311–320.

Shahrouzi, P., Forouz, F., Mathelier, A., Kristensen, V.N. & Duijf, P.H.G. (2024) Copy Number Alterations: a Catastrophic Orchestration of the Breast Cancer Genome, Trends in Molecular Medicine, 30(8).

Shieh, Y., Ziv, E. & Kerlikowske, K. (2020) Interval breast cancers, Nature Reviews Clinical Oncology, 17(3), pp. 138–139.

Siegel, R.L., Miller, K.D. & Jemal, A. (2017) Cancer statistics, 2017, CA: A Cancer Journal for Clinicians, 67(1), pp. 7–30. 

Song, H., Tran, T.X.M., Kim, S. & Park, B. (2024) Risk Factors and Mortality Among Women With Interval Breast Cancer vs Screen-Detected Breast Cancer, JAMA Network Open, 7(5), e2411927. 

Sonnenschein, C. & Soto, A.M. (2008) Theories of carcinogenesis: An emerging perspective, Seminars in Cancer Biology, 18(5), pp. 372–377. 

Sun, Y.S., Zhao, Z., Yang, Z.N., Xu, F., Lu, H.J., Zhu, Z.Y., Shi, W., Jiang, J., Yao, P.P., & Zhu, H.P. (2017) Risk Factors and Preventions of Breast Cancer, International Journal of Biological Sciences, 13(11), pp. 1387–1397.

Susnik, B., Frkovic-Grazio, S. & Bracko, M. (2004) Occult micrometastases in axillary lymph nodes predict subsequent distant metastases in stage I breast cancer: a case-control study with 15-year follow-up, Annals of Surgical Oncology, 11(6), pp. 568–72.

Tan, Y., Zhang, W., Huang, Z., Tan, Q., Zhang, Y., Wei, C. & Feng, Z. (2024) AI models predicting breast cancer distant metastasis using LightGBM with clinical blood markers and ultrasound maximum diameter, Scientific Reports, 14(1). 

Thery, L., Meddis, A., Cabel, L., Proudhon, C., Latouche, A., Pierga, J.-Y. & Bidard, F.-C. (2019) Circulating Tumor Cells in Early Breast Cancer, JNCI Cancer Spectrum, 3(2), pkz026.

Trapp, E.K., Fasching, P.A., Fehm, T., Schneeweiss, A., Mueller, V., Harbeck, N., Lorenz, R., Schumacher, C., Heinrich, G., Fabienne Schochter, Amelie de Gregorio, Tzschaschel, M., Rack, B., Janni, W. & Thomas (2022) Does the Presence of Circulating Tumor Cells in High-Risk Early Breast Cancer Patients Predict the Site of First Metastasis—Results from the Adjuvant SUCCESS A Trial, Cancers, 14(16), pp. 3949–3949.

Travis, R.C. & Key, T.J. (2003) Oestrogen exposure and breast cancer risk, Breast Cancer Research, 5(5), pp. 239–247.

Vanliere Canzoniero, J. (2022) Metastatic Breast Cancer, John Hopkins Medicine [online]. <https://www.hopkinsmedicine.org/health/conditions-and-diseases/breast-cancer/metastatic-breast-cancer>

Vogelstein, B. & Kinzler, K.W. (1993) The multistep nature of cancer, Trends in Genetics, 9(4), pp. 138–141. 

Wong, W. (2021) What is Residual Connection?, Towards Data Science [online]. <https://towardsdatascience.com/what-is-residual-connection-efb07cab0d55/>