Abstract

Glaucoma is a leading cause of blindness which is characterised by death of the optic nerve responsible for transmitting nerve signals from light receptors in the retina. It is very common, affecting 4.22 million people in the United States in 2022. Approximately half of those affected by glaucoma are unaware they have the condition. Artificial intelligence (AI) screening tools may augment the diagnosis and treatment of glaucoma through early detection and expanded access to screening. This paper aims to provide a comprehensive review and evaluation of the literature regarding the diagnostic accuracy, cost-effectiveness and implementation challenges of AI-enabled glaucoma screening. Several studies have validated AI-enabled glaucoma screening demonstrating high sensitivity and specificity for the detection of glaucoma; however, accuracy is reduced in real-world clinical settings. From a health economic perspective, AI can enhance glaucoma screenings through early detection methods and ease the demand on specialists, lowering costs, while emphasising that targeted approaches and optimised screening intervals allow healthcare costs to be more manageable. Further evidence is needed to validate the real-world effectiveness of AI-enabled glaucoma screening, including its implementation processes and cost-effectiveness.

Background

Glaucoma, a group of eye diseases characterised by optic nerve death, is a leading cause of irreversible blindness worldwide (1,2). It is often due to intraocular hypertension (increased pressure within the eye), one of the most common causes of glaucoma, which happens when there is resistance in the trabecular meshwork (where the iris and cornea connect), causing fluid buildup in the eye (3). The cause of resistance in the trabecular meshwork varies based on the type of glaucoma.

Although artificial intelligence (AI) continues to be developed as a screening tool for glaucoma, no AI screening tools have been approved by the FDA or other regulatory bodies. As glaucoma is often asymptomatic until vision loss begins, the development of autonomous screening tools for glaucoma detection could be greatly beneficial. However, one of the main issues is that there has been no clearly established diagnostic criteria, which is crucial for consistent and accurate diagnosis (4). Screening tools for both diabetic retinopathy (DR) and age-related macular degeneration (AMD) have progressed further than tools for glaucoma, with several currently available DR screening applications – IDx-DR (LumineticsCore), EyeArt by Eyenuk and AEYE Diagnostic Screening (AEYE-DS) – and one currently available AMD screening device – EyeArt by Eyenuk. EyeArt AI Systems or Eyenuk has been approved for both DR and AMD screening and is currently being tested for glaucoma screenings (5).

Screening tools for glaucoma use three main diagnostic accuracy metrics: sensitivity, specificity and area under the curve (AUC). Sensitivity is the measurement of a model’s ability to predict true positives, meaning the correct diagnosis of an eye with glaucoma (6). Specificity is the model’s ability to predict true negatives or correctly identify a healthy eye (6). Finally, AUC gives an overall measure of how well a test or model can distinguish eyes with glaucoma from eyes without glaucoma.

AI screening tools for glaucoma mainly use two image types: colour fundus imaging and optical coherence tomography (OCT) imaging. Colour fundus imaging takes a picture of the back of the eye, allowing eye specialists to see several parts of the fundus (or back wall of the eye), which are vital to vision. With fundus imaging, you are able to see the optic nerve (vital in diagnosing glaucoma), the retina, the macula and the choroid (7). Fundus imaging is often a normal part of eye exams, but can also be used for disease diagnosis. OCT imaging provides a cross-sectional view of the eye, similar to how a CT scan operates. When comparing the two imaging systems, colour fundus imaging is considered to be more cost-effective and portable, which makes it more accessible and suitable for community settings, making OCT the predominant imaging method for clinical care (8).

We have to be careful in the process of deploying AI in clinical settings because it often does not provide an explanation for its given output, which can result in errors being made without clinical justification. This can have medicolegal implications, whereby a doctor using the AI as a screening tool cannot explain the rationale for making a medical decision. This rationale is important to understand why an error was made, who was responsible and how to prevent it in the future.

AI can be an effective tool in clinical settings because of its time-efficient aspect. However, healthcare settings need to adapt their workflows before being ready to normalise AI-enabled glaucoma screening in clinical settings. The diagnostic accuracy of AI in glaucoma is highly important and needs to be worked on, as well as its cost-effectiveness. In this paper, we will discuss these elements, while considering the challenges to widespread adoption such as diagnostic accuracy, explainability and cost-effectiveness in both high- and low-resourced settings.

Diagnostic Accuracy

Mismatch between validation studies using highly curated images and real-world studies using lower quality images

AI shows strong potential for glaucoma detection using retinal fundus photographs, often achieving expert-level accuracy in validation studies. Yet, these results are not fully replicated in real-world settings. Curated datasets typically involve high-quality images captured in controlled research environments, while practical screening must contend with variable image quality, heterogeneous populations and resource-limited contexts.

The first mismatch lies in image quality and dataset curation. In a study on more than 241,000 high quality fundus images from 68,013 subjects, achieving near-perfect performance (AUC 0.996, sensitivity 96.2%, specificity 97.7%) (9), accuracy dropped when tested on less curated data: AUC declined to 0.964 in the Handan Eye Study, 0.923 in a US multi-ethnic cohort and 0.823 on web-sourced images, where specificity fell to just 70.4%. This pattern was echoed in the RETFound foundation model, trained on 1.6 million retinal images from diverse sources, which still demonstrated a drop in generalisability when tested on external datasets, with performance consistently lower than internal benchmarks (10). The 9,700 fundus images from the multi-ethnic DiGS/ADAGES cohort showed great variability. RETFound achieved a respectable AUC of 0.86, but this was substantially lower than Liu’s benchmark (9), underscoring how curated datasets can overstate clinical performance. The issue was extended into community screening, using smartphone-based fundus cameras (11). Their model maintained high sensitivity (93.7%) but at the cost of lower specificity (85.6%), again illustrating that real-world image variability introduces diagnostic trade-offs.

Another mismatch across studies concerns disease severity as a key factor of accuracy. RETFound achieved an AUC of 0.95 for advanced glaucoma but only 0.85 in early-stage cases (10), while a smartphone-based fundus camera system related study reported sensitivity rising from 86.9% in early glaucoma to 96% in advanced diseases (11). Although Liu et al. did not stratify performance by severity, the drop in specificity on noisier datasets implies that early, subtler cases may be particularly vulnerable to misclassification in real-world conditions (9). Hence, AI systems perform better when structural changes are pronounced and once disease is established, but early detection – the stage at which intervention is most valuable – remains constrained.

Furthermore, variability lies in the demographic and clinical representativeness of training and validation datasets, which have been found in population diversity and deployment contexts. Many validation datasets are disproportionately composed of patients from single institutions or narrow demographics. Liu et al. studied largely Chinese populations, limiting the global generalisability of their model. When tested on an external Singaporean dataset, sensitivity dropped from 95.6% in internal validation to 85.3%. While RETFound was pretrained on a broad, multimodal dataset, nevertheless, the model underperformed on external validations, revealing persistent challenges with domain shift (10). The multi-ethnic DiGS/ADAGES cohort reported an AUC of 0.86, lower than the internal benchmark of 0.90. This demonstrates a mismatch between training on curated datasets and testing in a more diverse external population. It was mentioned that despite its scale and pretraining, the model required recalibration when applied to external datasets, reflecting challenges in adapting to community-level variability. Similar findings were reported when deployment was extended to low-resource, community settings using portable fundus cameras, demonstrating feasibility and offline use. However, specificity declined from 94.1% in hospital environments to 85.6% in community screening, highlighting how population heterogeneity and environmental challenges can introduce variability.

Thus, curated high quality images yield inflated accuracy, especially for early-stage or subtle glaucoma. Foundation models pretrained on massive datasets require adjustment for external and heterogeneous populations. While smartphone-based screening increases reach, it also amplifies specificity challenges. To bridge this gap, AI systems should be evaluated on a diverse, real-world dataset, ensuring reliable integration into clinical workflow.

The risk of missing “incidental findings” e.g. ocular melanoma when screening only for glaucoma using AI

AI-based glaucoma screening systems, while achieving strong diagnostic performance for glaucoma itself, carry a significant risk of missing incidental but clinically important ocular pathologies such as ocular melanoma, diabetic retinopathy or age-related macular degeneration. This limitation consistently emerges from the design choices in both imaging and algorithm development. For example, in the offline, AI system for referable glaucoma screening using a smartphone-based fundus camera, the model achieved above 93% sensitivity and above 85% specificity for glaucoma detection using a single 42° disc-centred fundus image per eye captured, with an additional disc localisation step to crop the optic nerve head region (11). While this narrow focus improves glaucoma classification, it inherently excludes the macula and retinal periphery, where ocular melanomas, AMD lesions and diabetic retinopathy microaneurysms are often present. The authors acknowledged that the reference standard was not representative of comprehensive ophthalmic evaluation and that the study did not assess the system’s ability to detect other ocular diseases.

A similar pattern was found by Chuter et al. (10), who reported an AUROC of 0.94 for glaucoma detection. Yet, their reliance on 30-45° disc-centred images meant the system was not designed to detect other pathologies from fundus photographs. The images were required to have the optic nerve head and peripapillary region visible, but no requirement was made to capture the macula or peripheral retina. Any pathology that occurred outside the disc-centred region, such as ocular melanoma which often occurs in the posterior pole or mid-periphery, was outside the model’s scope. The single-disease, single-ROI framework essentially blinds the algorithm to non-glaucomatous abnormalities. In practice, lesions outside the optic nerve head would go unrecognised because they were neither present in the training labels nor captured within the imaging field. In the deep learning system (9), the glaucomatous optic neuropathy (GON) was based on a reference standard of stereophotograph grading by glaucoma specialists, so the AI was explicitly optimised for GON. Therefore, there is potential that other pathologies, such as optic neuropathies or retinal tumours, may not be identified if screening is based solely on this AI system.

Comparison between glaucoma and diabetic retinopathy screening with AI sensitivity, specificity and AUC

AI has emerged as a promising tool in ophthalmology more broadly, with most progress to date focused on diabetic retinopathy and glaucoma. Both diseases are major causes of irreversible blindness worldwide, but they differ in how clearly they present in retinal fundus photographs and how well AI systems generalise across populations. Several large studies have provided benchmark results, reporting performance through factors like sensitivity, specificity and AUC. Taken together, these studies show that AI screening for DR is more developed and consistent, while glaucoma detection still faces greater challenges in real-world use. Indeed, the progression in evidence generation for DR provides a template for future work in AI-enabled glaucoma screening.

In regards to diabetic retinopathy, recent research shows that AI systems consistently perform at a level suitable for clinical use. A group of researchers conducted a multi-centre study and reported a sensitivity of 95.3% and a specificity of 92.0% for detecting referable DR, confirming that AI can operate with expert-level accuracy in real-world settings (22). A 2024 systematic review of 34 studies further supported this, finding a pooled sensitivity of 94% and specificity of 89% across diverse populations and clinical environments (23). The importance of these findings is that they demonstrate that AI systems for DR are not only effective in controlled research datasets but also maintain reliability when deployed in everyday screening programmes. This consistency, paired with the fact that DR produces visible image features such as haemorrhages, exudates and microaneurysms, explains why DR screening AI is already being implemented in some health systems.

By contrast, glaucoma AI systems have achieved excellent results in controlled environments but have shown more variability in external validation and real-world settings. Liu et al. developed a deep learning model trained on over 270,000 retinal images to detect glaucomatous optic neuropathy. On internal validation, the model achieved near-perfect results with an AUC of 0.996, sensitivity of 96.2% and specificity of 97.7% (19). However, when tested on external datasets, performance decreased: in the Handan Eye Study, the AUC dropped to 0.964; in a US multiethnic cohort, AUC was 0.923 with sensitivity of 87.7% and specificity of 80.8% and with web-sourced images, the AUC fell to 0.823 and specificity to 70.4% (19). These results show us that glaucoma detection is more sensitive to changes in image quality and population diversity than DR detection. Unlike DR, glaucoma involves subtler structural changes in the optic nerve, which can be harder for AI systems to recognise consistently across different settings.

When comparing the two, AI-enabled DR screening appears more mature and robust, maintaining high sensitivity and specificity across international datasets and real-world applications (22,23). AI screening systems for glaucoma, while highly promising in curated datasets, face greater challenges in generalisability and require further refinement to reliably detect early disease in diverse populations. Overall, DR screening with AI is already close to routine clinical use, while glaucoma detection remains an important but ongoing area of development. The comparison underscores how differences in disease presentation and dataset characteristics shape AI performance, with DR models benefitting from clearer imaging features and glaucoma models requiring more sophisticated approaches to handle variability.

Summary of the diagnostic accuracy results from published studies for glaucoma

When examining the AI systems for AI in glaucoma, the results show very high accuracy in detecting glaucoma using various imaging methods. Diagnostic performance is typically evaluated with AUC, sensitivity and specificity. Sometimes, F1-scores will be considered for staging. These metrics reveal how well AI algorithms distinguish between glaucomatous and healthy eyes.

Fundus Photographs

Deep learning models applied to large datasets of retinal fundus photographs show excellent performance. AUC values typically range from 0.98 to 0.99, with sensitivity around 95% and specificity near 92% (14). This means that AI systems trained on these images can correctly identify a large number of glaucoma cases while minimising false positives. However, in patients with high myopia or unusual optic disc shapes, performance can drop. AUC values may fall to about 0.89, with sensitivity and specificity around 81-83% (15). This decline is expected because highly myopic eyes often have anatomical variations that complicate glaucoma detection.

Optical Coherence Tomography (OCT)

OCT-based deep learning systems generally achieve very high accuracy across diverse patient groups. Some advanced feature-agnostic methods on OCT volumes have reached AUC values near 0.94, significantly surpassing classical rule-based approaches (16). In challenging cases like high myopia, OCT models remain highly effective. For example, AI trained on macular vertical OCT scans can reach AUCs close to 0.97 (17). This shows that structural imaging of the macula can sometimes provide more reliable diagnostic information than circumpapillary scans. These results indicate that OCT imaging, paired with AI, has become one of the most reliable methods for glaucoma diagnosis (14).

Visual Fields

Although less explored than fundus photographs and OCT, AI used in visual field testing shows promise. Some models achieve nearly perfect performance, with AUC values around 0.99 for detecting glaucomatous visual field loss. Moreover, models designed to stage the severity of glaucoma have produced F1-scores between 86-89%, indicating solid consistency in assigning disease stages. The main challenge for visual field-based AI is the inherent variability and noise in functional testing, which complicates the training of reliable models compared to structural imaging.

Meta-Analyses and Overall Trends

Reviews and meta-analyses consistently show that AI models are very effective in glaucoma detection across various imaging methods. OCT-based approaches appear to be the most reliable, particularly in diverse patient groups, and they generalise well across both internal and external datasets. Fundus photograph-based models also perform strongly, but their accuracy can be affected by image quality and patient-specific anatomical variations. Visual field-based models remain promising but need larger, cleaner datasets to achieve the same level of consistency as imaging-based methods.

Interpretation

Overall, AI has shown excellent diagnostic performance in glaucoma detection. The most reliable results come from OCT-based models, which consistently achieve AUC values above 0.94 and sensitivity and specificity in the mid to high 90s. Fundus-based models also perform well, especially with well-curated datasets, though they may have some difficulty with atypical anatomy. Visual field-based models, while showing high potential, are more variable due to the subjective nature of the test. Together, these findings suggest that AI could play an important role in glaucoma diagnosis, either as a primary screening tool or as a supportive aid to clinicians. It is also important to note that optical coherence tomography and visual field testing equipment are significantly more expensive compared to simple fundus imaging, and therefore health economic considerations are crucial when planning for the widespread use of AI in glaucoma screening programmes.

Cost-Effectiveness

Which countries/healthcare settings is it cost-effective for and why?

Many high-income countries with access to higher levels of infrastructure and healthcare resources, including human salaries, demonstrate the use of AI-screening for glaucoma as financially beneficial. The most accurate and recent studies were conducted in Australia and the Netherlands where cost-effectiveness is demonstrated through national-level health economic models which assess quality-adjusted life years (QALY), costs and benefit costs-ratio.

Using Australia as an example, researchers used a national Markov model to simulate glaucoma progression in more than 12 million people aged 50 years or older across 40 years. They used the status quo, which is the manual screening, as a baseline and then compared it to three different scenarios in which AI is used. Researchers found that without the integration of AI, only 23% of the population over the age of 50 had their eyes screened for glaucoma. After the simulation of this scenario it turned out that around 177 million QALYs will be accumulated with a total care cost of $40.6bn USD. Scenario A in which the manual screening was replaced with AI screening provided an additional 7,454 QALYs and a cost ratio of 18.5. This makes it far more effective per dollar. Scenarios B and C are similar to A but with a higher uptake, or 100% participation in the case of C. In those cases the benefit-cost ratio ranged between 2.3 and 2.5 (24). These results suggest that AI screening is cost-effective primarily due to the earlier detection of glaucoma in a large, ageing population, preventing blindness which is hard to manage both clinically and socially. Australia has a universal healthcare system through Medicare, which makes AI screening financially attractive, as well as clinically beneficial, due to the fact that the savings from avoided treatment costs and the reduced number of people who are eligible to work affect the public system directly (25).

A similar conclusion can be derived from a study conducted by Dutch researchers. The study consisted of two scenarios: a Markov model which assessed AI-assisted screening every 5 years for Dutch residents aged 50-75, and an opportunistic detection, which is the current approach. The experiment reported an incremental cost-effectiveness ratio (ICER) of less than 20,000 euros per QALY gained. That is significantly lower than the Dutch threshold, which ranges from approximately 20,000 euros all the way up to 50,000 euros per QALY (26). The cost effectiveness in this setting is explained by the significant QALY gains achieved when glaucoma is detected early, which makes it a cost-effective intervention.

To conclude, AI glaucoma screening is cost-effective in both of these high-income countries due to the combination of an ageing population, strong healthcare structures and the economic burden of blindness, making early AI screening effective from an economic point of view.

Which countries/healthcare settings is it not cost-effective for and why?

Glaucoma screening, despite its clinical promise, does not always translate into economic value. In some cases, even AI-assisted programmes generate higher costs than medical benefits, making them financially unattractive. Evidence from both high- and low-income areas proves that effectiveness is dependent on progression rates and prevalence.

Research for AI screening for glaucoma was conducted in the outskirts of the county of Changjiang, a low-income setting with a lower-resourced healthcare system. Researchers used a Markov model to evaluate AI-assisted community screening for primary angle-closure glaucoma (PACG) in residents aged 65 and older. The experiment showcased some clinical benefits with 16.7% less PACG cases and 33% fewer cases of blindness. However, the economic value results are quite different. Healthcare costs in the first year exceeded $107,000 compared to just $15,000 without screening, and even though the opportunistic detection cost was rising year by year, the screening programme remained significantly higher throughout the 15 years simulation. In total the incremental cost of implementing AI screening amounted to roughly $430,900, which when divided equals $1,464 per affected person. There are two reasons for those suboptimal numbers: firstly, the high cost of the AI tools in the first five years; secondly, the slow progression of glaucoma which limited the benefits of early detection. In-low income regions, like the Changjiang rural areas, the economic burdens of using AI-assisted screening for glaucoma outweighs the medical benefits. Therefore, researchers concluded that AI screening is not cost-effective (27).

A similar conclusion was found in the United Kingdom, where an economic evaluation of organised screening for open-angle glaucoma raised concerns about its financial feasibility. At a population prevalence of less than 1%, cost-effectiveness will never be reached, no matter the resources. Incremental cost-effectiveness ratio (ICER) reaches £30,000 which is more than the current willingness-to-pay threshold. The problem in this case comes from the fact that a large number of people need to be screened from a society with a very low percentage of true cases, making the cost per case detected enormously high. This results in the modest clinical benefits being overshadowed by the high costs that are required to run the screening programme. Even within a well-run and developed healthcare system, AI screening for glaucoma will fail to be cost-effective if actual cases are relatively low (28).

Taken together, the studies conducted in China and the UK prove that AI-assisted screening for glaucoma may not be economically viable on a large scale. In China, the problems occurred due to the high costs of the intervention which were necessary for the screening programme. Meanwhile in the UK, the issue stemmed from the high cost per QALY due to the low number of cases detected. Therefore, when juxtaposed, the two studies conclude that for AI-enabled glaucoma screening to be cost effective for a country, the country should possess the financial resources and disease prevalence necessary to justify widespread screening.

Opportunities to improve cost-effectiveness

The rise of glaucoma in populations has increased the need for more efficient and accessible screening through AI. As AI has started to become a more promising tool in healthcare – from aiding in glaucoma screenings, catching early detections and lowering the number of people with advanced glaucoma – AI glaucoma screenings remain costly. In the rural areas of China, it ended up costing $1,464 more for each case of eye disease it prevented (27). While AI screening can help prevent blindness and worsening stages of glaucoma, the cost must be considered before it is implemented.

A study conducted in the Netherlands examined the cost benefits of AI-assisted population-based screening for primary open-angle glaucoma. It revealed that previous nationwide screening efforts were not cost-effective; however, using AI to analyse fundus images, using colour fundus photography, significantly lowered screening costs (26). Being a relatively low-cost and widely-accessible imaging method, fundus photography paired with automated glaucoma detection eliminates the excess need for experts to diagnose, while also maintaining diagnostic accuracy (29). Therefore, screening can be more affordable through new colour fundus photography and less guidance from experts.

AI screenings can also be improved by optimising screening intervals and focusing on high-risk individuals for glaucoma instead. From the Netherlands, studies on the Dutch population suggest that extending screening intervals to every seven years results in a better ICER due to the reduction in overall screening costs (26). Although the cost results won’t create as much change at first, it will add up over time and lower overall costs. An analysis from the University of Aberdeen supports the targeted approach of prioritising high-risk or high-prevalence groups to maximise clinical impact and economic efficiency, as this method focuses on high-risk groups like those with a certain family history or ethnic background where glaucoma is more common (30). This strategy reduces unnecessary costs, improves early detection rates and preserves healthcare resources. By reducing the number of screenings and targeting specific populations with a higher risk of vision loss, the cost effectiveness of AI-assisted glaucoma screening can be improved. Therefore, these adjustments will not only preserve clinical effort and effectiveness, but also ensure that limited healthcare budgets are directed to necessary efforts.

Modelling methods used in cost-effectiveness studies

Studies on the cost-effectiveness of AI use in glaucoma screening often rely on health economic modelling to reveal long-term outcomes and economic value. Two methods used in cost effective research are Markov models and decision trees.

In a Netherlands population-wide screening study, researchers used Markov models to show vision loss progression, screening outcomes and the costs related to primary open-angle glaucoma over a patient’s lifetime. Markov models capture long term glaucoma progression across health states and transitions and show that AI assisted screening can be cost effective when high specificity and adherence rates are achieved (26). The AI screening identifies people without glaucoma, minimising false positives, and sees how consistently patients follow through with the next recommended steps, such as exams or starting treatment, increasing early intervention and decreasing unnecessary healthcare costs. The Markov approach also allows for a detailed modelling of treatment effects and quality adjusted life years, showing the progression for conditions involving vision loss over time (31).

A similar study based in rural China researched AI assisted glaucoma screening among elderly individuals living in remote areas using the Markov prediction model. Researchers created a 15-year simulation to track how the eye disease progresses and what the costs would be. The model considered things like how the disease develops over time, how well people follow up with treatment and the accuracy of AI in detecting glaucoma from eye images. The AI system’s accuracy was based on other studies using patient data from rural China to make sure the results were realistic for the population. The researchers studied the healthcare costs over a period of 5, 10 and 15 years, comparing the AI screening to no screening for glaucoma. Although the model showed a reduction in disease progression and blindness, the incremental costs of the screenings from initial investments or medical care limited its cost effectiveness (27). Through the results of Markov models, it reveals that AI screening for glaucoma is not cost-effective in the short term but has long term potential to become cost effective.

Unlike the Markov model, decision trees are mainly used for modelling short-term results. A decision tree is a diagram where each branch represents a possible event, such as a positive or negative result from a glaucoma screening. Each outcome in the tree is assigned a probability, cost and expected health effect. For example, after an AI screening, a patient may either be referred for more testing, start treatment or be told no further action is needed. These possible outcomes are shown in the tree, allowing researchers to calculate the average cost and health outcome for each path. According to Hilary F. Ryder, an assistant professor of medicine, and other medical professors, decision trees are helpful in choosing the most cost-effective option among different strategies, especially when all the outcomes happen within a short time period (32). In glaucoma screenings, decision trees model what happens immediately after a person is screened by an AI system and whether they are correctly or falsely diagnosed or asymptomatic. This allows researchers to measure the costs of unnecessary referrals compared to the benefits of early detection. However, decision trees do not show how the disease progresses over time as they are limited when dealing with chronic diseases that involve repeated treatments or changing health states (31). Despite this, decision trees still play an important role in AI screening research and, when combined with Markov models, help to reveal the short-term costs and benefits of using AI to detect glaucoma in different populations.

To conclude, both Markov models and decision trees are methods used to determine if using AI to screen for glaucoma is worth the cost. Looking at the short-term results of decision trees and the long-term results of Markov models together, and how the disease and costs change over a lifetime, reveals that while AI screening may not save money right away, it can become cost-effective in the future.

CONCLUSION

In conclusion, AI can be very effective and efficient in the screening of glaucoma, although more evidence is required to determine its real-world effectiveness and therefore cost-effectiveness. Before being able to fully deploy AI screening tools into the real-world clinical care systems, both aspects will have to be improved, as well as the issues of bias and sufficient diagnostic accuracy metrics. Avoiding false negative results is critical and considering the medicolegal implications of missed diagnoses is necessary for regulators and policymakers. Finally, although this review was specific to AI-enabled glaucoma screening, the principles of diagnostic accuracy, cost-effectiveness and implementation challenges, including medicolegal responsibility, is relevant to other medical AI applications.

Bibliography

Young, L.M. (2025). Glaucoma Facts and Stats. Glaucoma Research Foundation. <https://glaucoma.org/articles/glaucoma-facts-and-stats>
Chaurasia, A.K., Greatbatch, C.J. & Hewitt, A.W. (2022). Diagnostic Accuracy of Artificial Intelligence in Glaucoma Screening and Clinical Practice. Glaucoma Journal, 31(5).
Cleveland Clinic (2022). Glaucoma: Causes, Symptoms, Types, Treatment & Prevention. [Internet] Cleveland Clinic. <https://my.clevelandclinic.org/health/diseases/4212-glaucoma >
Huang, J.J., Channa, R., Wolf, R.M., Dong, Y., Liang, M. & Wang, J. et al. (2024). Autonomous artificial intelligence for diabetic eye disease increases access and health equity in underserved populations. [Internet] Npj Digital Medicine, 7(1):1–6. <https://www.nature.com/articles/s41746-024-01197-3>
EyeArt (2020). Eyenuk, Inc. ~ Artificial Intelligence Eye Screening. [Internet] EyeArt. <https://www.eyenuk.com/us-en/products/eyeart/>
Deepchecks (2024). What is Sensitivity and Specificity of Machine Learning? [Internet] Deepchecks. <https://www.deepchecks.com/glossary/sensitivity-and-specificity-of-machine-learning/>
Clinic C. Fundus Photography Lets an Eye Specialist Take Photos of the Back of Your eye. It’s a Common Part of Eye Exams and Can Help Diagnose Many Eye issues. [Internet]. Cleveland Clinic. 2024 [cited 2025 Aug 29]. Available from: https://my.clevelandclinic.org/health/diagnostics/fundus-photography
Yousefi S. Clinical Applications of Artificial Intelligence in Glaucoma. JOVR [Internet]. 2023 Feb. 13 [cited 2025 Aug. 30];18(1):97-112. Available from:
Chaurasia A, Greatbatch C, Hewitt A. Diagnostic Accuracy of Artificial Intelligence in Glaucoma Screening and Clinical Practice [Internet]. Google Drive. 2022 [cited 2025 Aug 29]. Available from: https://drive.google.com/drive/folders/1RbWvP8Ho087kbj1xztjBv4igKYSHN2YF
Cleveland Clinic. Glaucoma: Causes, Symptoms, Types, Treatment & Prevention [Internet]. Cleveland Clinic. 2022 [cited 2025 Aug 29]. Available from: https://my.clevelandclinic.org/health/diseases/4212-glaucoma
Cleveland Clinic (2024). Fundus Photography Lets an Eye Specialist Take Photos of the Back of Your eye. It’s a Common Part of Eye Exams and Can Help Diagnose Many Eye issues. [Internet] Cleveland Clinic.<https://my.clevelandclinic.org/health/diagnostics/fundus-photography>
EyeArt (2020). Eyenuk, Inc. ~ Artificial Intelligence Eye Screening. [Internet] EyeArt. <https://www.eyenuk.com/us-en/products/eyeart/>
Huang, J.J., Channa, R., Wolf, R.M., Dong, Y., Liang, M. & Wang, J. et al. (2024). Autonomous artificial intelligence for diabetic eye disease increases access and health equity in underserved populations. [Internet] Npj Digital Medicine, 7(1):1–6. <https://www.nature.com/articles/s41746-024-01197-3>
Hung, K.H. et al. (2022) Application of a deep learning system in glaucoma screening using glaucoma fundus photographs. BMC Ophthalmol, 22:2730. <https://doi.org/10.1186/s12886-022-02730-2>
Thompson, A.C. et al. (2024). Deep learning and optical coherence tomography in glaucoma detection: a decade of advances. Ophthalmol Sci. <https://pmc.ncbi.nlm.nih.gov/articles/PMC11182271>
Deepchecks (2024). What is Sensitivity and Specificity of Machine Learning? [Internet] Deepchecks. <https://www.deepchecks.com/glossary/sensitivity-and-specificity-of-machine-learning/>
Maetschke, S. et al. (2019). A feature agnostic approach for glaucoma detection in OCT volumes. Cornell University. <https://arxiv.org/abs/1807.04855>
Kim, J.A. et al. (2023). Development of a deep learning system to detect glaucoma using macular vertical OCT scans of myopic eyes. Scientific Reports, 8040. <https://doi.org/10.1038/s41598-023-34794-5>
Liu, H., Li, L., Wormstone, I.M., Qiao, C., Zhang, C., Liu, P., Wang, Y. & Li, W. et al. (2019). Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs. JAMA Ophthalmol, 137(12):1353-60. <https://doi.org/10.1001/jamaophthalmol.2019.3501>
Chuter, B., Huynh, J., Hallaj, S., Walker, E., Liebmann, J. M., Fazio, M. A., Girkin, C. A., Weinreb, R. N., Christopher, M. & Zangwill, L. M. (2024). Evaluating a foundation artificial intelligence model for glaucoma detection using color FundUS photographs. Ophthalmology Science, 5(1), 100623. <https://doi.org/10.1016/j.xops.2024.100623>
Rao, D. P., Shroff, S., Savoy, F. M., S, S., Hsu, C., Negiloni, K., Pradhan, Z. S., P, J., V., Sivaraman, A. & Rao, H. L. (2023). Evaluation of an offline, artificial intelligence system for referable glaucoma screening using a smartphone-based fundus camera: a prospective study. Eye, 38(6), 1104–1111. <https://doi.org/10.1038/s41433-023-02826-z>
Uy, H., Fielding, C., Hohlfeld, A., Ochodo, E., Opare, A., Mukonda, E., Minnies, D. & Engel, M.E. (2023). Diagnostic test accuracy of artificial intelligence in screening for referable diabetic retinopathy in real-world settings: A systematic review and meta-analysis. PLOS Glob Public Health, 3(9):e0002160. <https://doi.org/10.1371/journal.pgph.0002160>
Khan, S., Sabanayagam, C. & Ting, D.S.W. (2024). Diagnostic accuracy of artificial intelligence for detection and grading of diabetic retinopathy in real-world settings: systematic review and meta-analysis. Lancet Digit Health, 6(3):e210–22. <https://doi.org/10.1016/S2589-7500(24)00012-3>
Jan, C., Hu, W., Vingrys, A.J., van Wijngaarden, P., Stafford, R.S., He, M. & Zhang, L. (2024). Cost-Effectiveness and Cost-Utility of Artificial Intelligence-Assisted Population Screening for Glaucoma in Australia: A Decision-Analytic Markov Model Approach. Investigative Ophthalmology & Visual Science, 65(7):613. <https://iovs.arvojournals.org/article.aspx?articleid=2796046>
Australian Government (2025). Medicare—help paying for medicines and health care. Canberra: Australian Government. <https://my.gov.au/en/services/health-and-disability/seeking-medical-help/help-paying-for-medicines-and-health-care/medicare>
Boverhof, B.-J., Ramos I.C., Vermeer, K.A. et al. (2025). The cost-effectiveness of an AI-based population-wide screening program for primary open-angle glaucoma in the Netherlands. Economic Evaluation, 28(9):1317-1326. <https://www.valueinhealthjournal.com/article/S1098-3015(25)02410-6/fulltext>
Xiao, X. et al. (2021). Health care cost and benefits of artificial intelligence-assisted population-based glaucoma screening for the elderly in remote areas of China: A cost-offset analysis. BMC Public Health. <https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-021-11097-w>
Hernández, R.A., Burr, J.M. & Vale, L.D. (2008). Economic evaluation of screening for open-angle glaucoma. International Journal of Technology Assessment in Health Care, 24(02):203-11. <https://www.cambridge.org/core/journals/international-journal-of-technology-assessment-in-health-care/article/abs/economic-evaluation-of-screening-for-openangle-glaucoma/56E452E8E05C4C0FF43E3F7AB4952B98>
Lemij, H.G. et al. (2023). Characteristics of a large, labeled data set for the training of artificial intelligence for glaucoma screening with fundus photographs. Ophthalmology Science, 3(3). <https://www.sciencedirect.com/science/article/pii/S2666914523000325>
Burr, J.M. et al. (2007). The clinical effectiveness and cost-effectiveness of screening for Open angle glaucoma : A systematic review and Economic Evaluation. Aberdeen University Research Archive. <https://aura.abdn.ac.uk/handle/2164/174>
Drabo, E.F. & Padula, W.V. (2023). Introduction to Markov Modeling. Oxford Academic.
Ryder, H.F. et al. (2009). Decision analysis and cost-effectiveness analysis. PubMed Central. <https://pmc.ncbi.nlm.nih.gov/articles/PMC3746772/>