Supervised by: Chi Him Kendrick Yiu BSc (Hons), MRes. Kendrick is studying for a DPhil in Medical Sciences at Wolfson College, University of Oxford, specialising in cardiovascular research. He graduated with a Bachelor’s Degree in Biomedical Sciences in Hong Kong before pursuing a Master’s Degree in Clinical Research at Imperial College London. His interests lie in the mechanisms of heart diseases and their potential therapies.


Coronary heart disease (CHD) is a highly prevalent disease in the world, with an estimated number of 17.9 million diagnoses in 2019 while also being responsible for up to 32% of global deaths (World Health Organization). CHD is considered a moderately severe condition due to the build up of harmful fatty acids within the arteries, which cause the narrowing of arterial walls. As a consequence of this buildup, the blood flow to the heart is reduced, putting the heart muscles at risk of not receiving an appropriate amount of oxygen to function properly (NHS, 2020). Patients diagnosed with CHD are at a significant higher risk of experiencing heart attacks or strokes, which could eventually lead to death. Some cases of CHD have emerged from pre-existing medical conditions or habits, such as a diet containing high levels of trans saturated fats, low exercise, diabetes, and some genetic conditions. These are considered risk factors for CHD, conditions that heighten a subject’s chances of being diagnosed with CHD at some point in their lives (Huang, 2022). To be considered as a risk factor, its existence must have a known correlation with CHD diagnoses (NHS, 2020). Considering the severity of CHD, it is important to be able to detect risk and presence of CHD at an earlier stage to make treatment and prevention more feasible.

Machine learning AI programs have shown promise in diagnosing CHD in early stages, particularly risk prediction programs. This review attempts to look at the importance of four different risk factors for CHD: diet/exercise, genetic information, and type 2 diabetes, their impact on patients’ cardiovascular health, and the existing literature on risk prediction machine learning AI programs for each risk factor. 



According to statistics of the World Health Organization (WHO) and several global healthcare establishments, cardiovascular disease (CVD) remains to be the highest cause of death in all heart diseases since 2017. An estimated 17.9 million people died from CVD in 2019, representing 32% of all global deaths. Of these deaths, 85% were due to heart attack from coronary heart disease (CHD) (World Health Organization, 2021).


CHD is a condition that emerges due to the build up of fatty acids within the coronary arteries, a process referred to as atherosclerosis. With the buildup of fatty acids within the arteries, the artery walls begin to narrow. As a consequence of this buildup, blood flow toward the heart is compromised and it limits the amount of oxygen supplied to the heart muscles, putting patients at greater risk of having a heart attack (NHS, 2020).

Fundamentally, the buildup of atheroma (plaque) within the arteries is what causes the walls of the coronary arteries to narrow. There are many risk factors that contribute to the buildup of this plaque, such as a high saturated fat intake, low levels of exercise and activity, genetics, and diabetes (Huang, 2022).

When determining whether a patient has CHD, the patient is first tested for overall risk of CVD. The initial tests include a review of family medical history, a blood pressure check, and a blood test to assess factors such as cholesterol levels, lipoprotein A [Lp(a)] , fasting glucose and hemoglobin A1c (HbA1c). In order to further confirm a patient’s diagnosis of CHD, electrocardiograms, x-rays, echocardiograms, and exercise stress tests could be used to manually assess risk of CHD in a patient (NHS, 2020).



Diagnosing CHD in its early stages allows for a higher likelihood of successful treatment, and machine learning (ML) risk prediction programs show promise in making early diagnoses more achievable (Kee et al., 2023).

However, the prediction and early diagnosis of CHD is an extremely hard and onerous task, especially because it is time-consuming and high-risk in case of false negative results. Therefore, throughout the past century, organizations have been investing vast amounts of time and fundings into investigating a quicker, and more accurate way of detecting CHD, which led to the discovery of artificial intelligence (AI) and machine learning (ML) (Jiang, 2019).

Artificial intelligence is the capability of a computer system to mimic human cognitive functions, such as learning and problem-solving. Through AI, a computer system uses math and logic to simulate the reasoning that people use to learn from new information and make decisions.

Machine learning is an application of AI. It’s the process of using mathematical models of data to help a computer learn without direct instruction. This enables a computer system to continue learning and improving on its own, based on experience.

There has been a dramatic increase in the development of information technology over the past decade, particularly in the fields of AI and machine learning (ML). These developments are having a significant impact on daily lives, ranging from entertainment to personal finances (West and Allen, 2018). ML is now greatly influencing medicine. ML models have been built and coded, increasing the chances of treating CHD. ML prediction models have been proven to be as reliable as even cardiovascular specialists at diagnosing heart conditions, making decisions, predicting risks, and doing other medical tasks since it has a huge advantage of requiring less time for all these processes (Lazda, 2022). 

This article targets the idea of how machine learning AI models can use data inputted from either biobanks or other approaches to calculate the risk of CHD.    


Cardiovascular diseases (CVDs), particularly CHD, have gathered attention due to their status as a leading cause of mortality worldwide (Ahmadi and Lanphear, 2022). This section will delve into several notable strategies and studies that have harnessed these techniques to enhance the understanding of CHD. Various ML techniques have been used to detect and predict CHD, on the basis of various factors (Wang et al., 2022). Supervised learning, where algorithms learn from labeled data to discern patterns, make predictions, and guide clinical decision making, acts as the basic framework for these various programs. Some examples include:

1 – Logistic Regression (LR): Estimates the likelihood of CHD based on factors such as age, sex, smoking habits, cholesterol, and blood pressure. It employs supervised learning, utilizing labeled data to create a model that estimates probability.

2 – Support Vector Machines (SVM): Also using supervised learning, seeks to create a hyperplane that effectively distinguishes CHD-positive individuals from CHD-negative ones. By identifying decision boundaries, SVM helps create more precise predictions.

3 – Random Forest (RF): Leverages ensemble learning through supervised learning. By analyzing multiple decision trees, RF creates a model that encapsulates a broad spectrum of outcomes and symptoms.

4 – XGBoost: Focuses on boosting weak predictive models using a combination of supervised learning and gradient boosting. It does this by iteratively refining weak models such as decision trees in order to form a more robust and accurate model.

Furthermore, interpretability techniques such as SHAP (SHapely Additive exPlanations) are often used alongside ML. SHAP works by attributing the contribution of each feature to the prediction made by the machine learning model providing insights into how different features of a model contribute to the output and so it can provide doctors with insights to which specific factors influence the model’s decision (Chen et al., 2022).The formidable synergy between ML and SHAP has proven to provide accurate assessment and stratification – specifically of the 3 year all-cause mortality risk in patients with CHD related heart failure (Wang et al., 2021). This information ensures medical professionals to make more informed decisions and provide personalized care, diagnosis and treatment plans with the use of ML. Unified under these methodologies, ML is very rapidly adapting to create and support intricate patient patterns extending their application (Gonsalves et al. 2019). The essence of their effectiveness lies in the potency of supervised learning, which acts as a driving force for these algorithms to learn patterns and create those models. 

ML is also capable of building new quantitative markers, such as “ISCAD”, that integrates factors such as medical history and biomarkers, and converts them into a single quantitative marker (Forrest et al., 2023). This score relies predominantly on supervised learning principles. It uses this labeled data to identify patterns, relationships and correlations between factors and the presence or severity of CHD. By analyzing these patterns, the model learns to associate specific combinations of variables with different levels of risk. Once the model is trained, it can then predict CHD for new, unseen cases based on the patterns it has learnt from the labeled data. 


Though machine learning provides a significant advantage, such as cost and time friendly to prediction of risk factors, there are still inevitable limitations and challenges at this stage of technology. 

Regarding how machine learning AIs function with their unique techniques, it is undeniable that they all require visionary assistance from a supervisor. Therefore, to ensure more accurate data, it’s essential that human supervision is required – often the more supervision an ML program receives, the more accurate it is (Reel et al., 2021). However, it is still possible for an AI machine learning to perform unsupervised research by exploring and detecting what types of labels best explain the data, but the results obtained usually bring about skewed data, suggesting inefficiency in getting the results expected (Libbrecht and Noble, 2015).

Lack of technology advancement is due to data control. ML performs at its optimum when a large set of data is inputted. However when the amount of training data is too large, overfitting occurs (Vilne et al 2019).

Overfitting is when the ML model performs very well with training data but has poor accuracy and performance when it comes to test data, often leading to inaccurate results and unwanted malfunctions. Due to ML limitations, the only way to overcome overfitting is by cross validation. Cross fitting validation can detect overfitting by identifying how well the model can generalize to other datasets. However, cross validation also needs regular monitoring, therefore leading back to the cause of mandatory supervision.

Despite this, there is no denying the benefits that properly implemented AI could bring to healthcare, through early diagnoses and thus higher chances of treatment success (Kee et al., 2023). Thus, it is of significant importance to assess the current literature of AI in risk prediction for CHD, focusing specifically on three different factors: diet/exercise, genetic predisposition, and diabetes.



CHD is intricately linked to lifestyle factors, such as diet and exercise. Dietary habits which include high saturated fat consumption and low intake of dietary fiber can contribute to inflammation and metabolic dysfunction. This increases the risk of cardiovascular issues, and low physical activity can also enhance these effects through multiple mechanisms (Szczepańska et al., 2022). 

The primary cause of CHD is nearly synonymous to atherosclerosis, a process where fatty deposits accumulate within arteries narrowing them. High intake of saturated fats leads to elevation of cholesterol in the bloodstream, which infiltrates the endothelium and oxidizes it, triggering an inflammatory response (Linton et al., 2019). Immune cells are attracted to the site of inflammation, engulfing the oxidized cholesterol and transforming them into foam cells, which accumulate within the walls of the artery and form fatty streaks. This makes blood vessels less functional and can cause blood clots, causing poor blood supply and eventual CHD.


Often cited as the cause of many diseases, weight gain is once again a main proponent in CHD. This research previously discussed how certain foods and poor diet can cause CHD, but this can be adjusted for with positive body exercises. Those who don’t exercise along with an unhealthy diet gain unhelpful amounts of transatured fats and general lipids. These lipids can include cholesterol, but it’s not just lipids when it comes to weight gain. 

Due to the extra mass, the body has to pump extra amounts of oxygen and nutrients to the additional lipoprotein tissues. The distribution of fat throughout the body is imperative because if high proportions coincide in one area, the rate of metabolism increases risk for overall cardiovascular disease. An example of this is visceral fat (VF), or fat wrapped around the belly that exists on top of the liver and intestines. Excess amounts of VF favor increased production of the cytokines Tumor Necrosis factor alpha (TNF-a) and IL-6, which are pro-inflammatory; unstable. These cytokines have properties that favor atherosclerosis behavior, such as increased secretion of fatty acids, which slowly lead to blockage of blood vessels and the restructuring of the cardiovascular system (Carbone, 2019). The increased body tissue slowly leads to increased amounts of nutrients and oxygen required to be transported, which means that more blood vessels have to be used more efficiently; an overall increase in blood volume. This inevitably results with the heart having to recuperate with faster pumping, which can strain the heart over time, resulting in poor clearance of blood through vessels.

Building off the idea of unequal body fat distribution, sarcopenia is an increasingly common condition that occurs in the demographics with the greatest risk to CHD. This is when the functionality of lean mass (also referred to as muscular mass loss) is reduced. When accompanied with poor exercise and poor diet, the increase of fat mass serves as an addition to the already withering, lean mass, which is unproportional and dangerous. This condition is regarded as sarcopenic obesity (Carbone, 2019). The prevalence of this condition is far more common in older ages, as percentages and data can differ from 1 to 85% in both male and female of all ages, showing how the condition can be more widespread than believed (Purcell, 2021). Sarcopenic obesity tends to have a worse diagnosis in all general cardiovascular diseases, but especially CHD. This can be due to the muscle fibre denervation and reduction in anabolic hormone production, as the weaker body is less ideal. The age-related changes to mass along with the aforementioned hormone production can cause obese sarcopenia to have higher rates of insulin resistance (Therakomen et al., 2020).

Insulin resistance can also be caused by cytokines and adipokines, but is distinct due to its function. Insulin resistance is important due to how it prevents cells from easily absorbing glucose into blood. This increases sugar levels, which can lead to diabetes, but worse is the result it has on the pancreas. The pancreas secretes more insulin to combat the inability of glucose digestion, but too much insulin can lead to inflammation, which further damages vessels and surrounding tissue (Wilcox, 2005). It also plays a key role in the production of NO, nitric oxide, which is an imperative molecule that pertains to helping relax blood vessels through their work (Roberts Jr, 1993).Without this, due to the hypertension and stress on the vessel, arteries may constrict and start closing, which causes reduced blood flow and damage to vessel walls. This helps atherosclerosis accumulate, eventually leading into CHD in some ways (Reavan, 2012).

The overwhelming negatives may seem brutal, but cardiovascular exercises can play a crucial role in overcoming these obstacles. A mix between resistance training (to avoid lean mass loss) and cardiovascular exercises (to reduce visceral fat) are often advised (Law et al., 2016). Increased resistance training is a great way to target insulin sensitivity, as it increases muscle mass and reduces body fat, which can help increase glucose intake and helps with regards to more muscles being utilized (Kehsel, 2015). Burning off extra fat reduces the risk of cytokines, such as interferon-gamma and interleukin 1 beta (IL-1B), which further promotes better body shape and composition (Amin et al., 2020). The loss of weight alleviates the tension on the blood vessels by returning body shape back to normal. Regular exercise through cardiovascular exercises like swimming, cycling and jogging is crucial to prevent any further cholesterol and fatty acid build up in blood vessels (Tian and Meng, 2019).


Advancing from ways to prevent getting the disease, are ways to accurately and efficiently determine certain risk factors with the usage of machine learning risk prediction models. One such simple and accurate method is decision tree learning, which is used for highly categorized data or highly accurate probability data from an original data set (Song and Lu, 2015). The decision tree takes the original data and continuously divides the data apart under parameters, and then combines those certain parameters to form predictions or choices on the data. However, this requires manual guidance at first in some situations to condition the AI learning tool to the right set of results, similarly to a regression tree. It’s a variant of the aforementioned random forest AI technique, just less rigorous and preferred for its slight simplicity. In a study aimed to evaluate the accuracy of this method, patients with metabolic syndrome were requested to fill out surveys, give information, and be tested on certain parameters which were suspected to be in some form of relation to metabolism syndrome. Metabolism syndrome is a cluster of disorders that increase risk towards stroke and heart disease. In the study, the patients were tested with FibroScan, and the information was given to the several different trees, which over a long time period split the results up until they reached final predictions. After each round, chi square formulas along with knowledge and entropy were utilized by the AI to increase their accuracy after each progressing round, and through several different tree routes, the AI sorted out over 40 different potential symptoms. It was found that the tree learning algorithms identified the metallic syndromes with high accuracy, as serum GOT, obesity and HbA1c were found to be important predictive variables. This is impressive because they are among some of the leading causes for metabolism syndrome, as HbA1c relates to blood pressure. The others all matched up at relatively higher accuracy rates and with a noted faster efficiency nod, it was clear that decision tree learning through its several branches is an optimal case in medical data (Yu et al., 2020). The general debate on decision tree learning in the medical community is towards the positive side, as numerous studies have affronted to usage before. However, some lifestyle factors don’t help much with metabolism rates due to some certain genomes in the human code.

In 2022, a new application of a sophisticated AI framework for dynamic treatment recommendations in CHD was discovered (Guo et al., 2022). This approach integrated supervised learning and reinforcement learning within a long short-term memory (LSTM) network. Supervised learning allowed the model to imitate treatment decisions made by doctors while reinforcement learning optimized patient outcomes by learning from reciprocal action with patient data. The AI model received inputs, such as patient diagnosis, health status, and other clinical features to generate specific recommendations, providing the patients using the data input with the required prescription. This AI model was trained and evaluated using an existing ICU unit database, consisting of over 13,000 CHD-diagnosed patients. The conclusion of the study included assessing the reduction of in-hospital mortality and comparing the AI model’s treatment recommendations with those of doctors using the Jaccard simplicity index. Furthermore, the interpretability of these models was illustrated by using a random forest method to analyze the feature importance of both the clinician’s policy as well as the AI policy. This complete strategy seeks to improve and provide dynamic, AI-driven treatment solutions that align with a clinician’s competence and enhance patient outcomes.


Whilst it is commonly believed that CHD occurs to people over time, some certain specific genetics greatly alter the likelihood of getting the disease regardless of environmental conditions. To further this point, the genetic code of someone can receive mutations to the chromosome, which in some cases greatly damage your heart. In other cases, there is no scientific data as to why some people have genetically passed down traits which limit heart flow, or make cholesterol buildup easier. 

Upon a more scientific outpost on the first genetic predisposition, LDL, or otherwise known as low density lipoprotein, is the buildup of what is colloquially known as “bad cholesterol”. LDL is dangerous as “Low-density lipoprotein (LDL) cholesterol and blood pressure are well-established causal risk factors for coronary heart disease (CHD)” (Ueda et al., 2018). Whilst LDL can be built up over time to certain lifestyle habits, research concludes that it is undeniable that the rate of the cholesterol build up is heavily determined by genetics (Rizk et al., 2015). This can be attributed to genetic variation in several genes that control the metabolism of LDL, like PCSK9, APOB, LDLR (main gene that controls the effects of LDL) and APOE. Most of these proteins can be found in the digestive tract, responsible for the degradation of LDL receptors in the liver, stomach and intestines (Hajar, 2019). Any mutations in these proproteins can cause familial hypercholesterolemia. This condition gets spread down generations, which is a fatal component in CHD.

On a more broad outview, single nucleotide polymorphisms (SNP) are mutations whose downsides tend to affect lipid containment more. The aforementioned familial hypercholesterolemia is a byproduct of SNPs. However, SNPs are far more diverse and can cause other conditions that make patients more likely to receive CHD. To start with, SNP’s are variations in nucleotides in a person’s genome. Whilst they can be anything and affect different parts of the body, SNP’s have a significantly higher role in the prenatal cause of CHD than most other cardiovascular diseases (Sitinjak et al., 2023). They can affect the expression of the function of genes involved in inflammation, lipid metabolism and atheroma, which are typical symptoms of most CHD patients. An example of this can be traced to SNPs in genes related to the nucleotides that express genes like MTHFR and ACE, which are genes that are related to the body process of producing folic acids which maintain serum levels. However, a polymorphism to these genes can cause it to reduce the natural body’s ability to lower homocysteine levels. This reduces the body’s ability to produce important vitamins and hampers whether the body can clear the bloodstream, which plays a big role in CHD analysis (Masud and Baqai, 2017).


Whilst more SNPs can cause inflammatory and immune responses and not all negative SNPs can be discovered, they do have a rare positive. In order for better finding patients at higher risks for certain diseases, SNPs can be tracked amongst family history and in patients as medical professionals can assess the patients’ reception of certain genes. This is where AI comes into play, as AI can be extremely beneficial in taking the data values of a patient and determining if a pattern correlates heavily and whether the patient is at risk for cardiovascular diseases. Analyzation of the neural network and a close up on proteins caused by SNPs demonstrated that genetics play a role in cardiovascular diseases commonly. The data values were sorted by AI, which was able to come up with the previously mentioned conclusion (Quazi, 2022). This showcases how SNPs, whilst being a major cause of most CHD through random genetics, can also help early patient risk outcomes, increasing the amount of time the patient has and the survival rate generally.

Certain syndromes also increase the chances of contracting coronary heart disease. One in particular known as Marfan syndrome (MFS), is caused by what can be considered a form/variant of an SNP, as a mutation in the gene fibrillin-1 (FBN1), which is responsible for creating certain proteins. Without these, the body has a tougher time creating connected body tissue (Robinson, 2006). This is because the mutation of this gene is unable to form a bond with transforming growth factor beta (TGFb), which results in a detrimental increased TGFb tissue levels (Benke, 2013). This is problematic as these tissues are made out of TGFb1, which is a potent pro fibrotic cytokine. In other words, without the ability to bind to fibrillin, the tissue starts producing excessive amounts of extracellular matrix components like collagen, which is detrimental for the tissue (Shah, 1999). This alternative variant is referred to as scar tissue, which on a more widespread basis on an organ is considered as fibrosis, a highly damaging disease. These weakened connective tissues result in joint hypermobility since the tissues can’t provide enough support, and more commonly elongated limbs and bones. This happens due to the poor connection between epiphyseal plates since the tissue can’t provide support and regulation to the growth plates. This allows them to grow extra long, leading to longer bones and elongated limbs (Hunziker, 2018). The abnormal body shape is highly unstable, which makes it a lot harder for the heart to pump blood through. This instability is furthered when these tissues can’t withstand the pressure of the heart sometimes; they are at a higher risk of rupturing and destroying the artery (Stuart, 2007). Additionally, the poor connective tissues make the arteries more prone to constriction and narrowing. All these separate factors make marfan syndrome an increased risk for cardiovascular disease, and coronary heart disease in particular (Lazea, 2021). 


Diabetes mellitus (DM) is a common disease resulting from a disorder in insulin secretion from the pancreatic beta cells (Fan et al, 2020). As of 2019, 463 million patients were diagnosed with DM, and the number of diabetes diagnoses has since risen significantly (Data Atlas, 2019). Currently, DM is listed as one of the top ten causes of death and its prevalence is expected to rise to 700 million cases by 2045 according to the International Diabetes Foundation (Data Atlas, 2019). Diabetes Mellitus can be classified into two groups, type 1 and type two diabetes (T1DM and T2DM); however, T2DM tends to be more prevalent in cases than T1DM.

Typically, T2DM tends to come with other health complications, as patients who are already diagnosed with T2DM are more susceptible to other complications such as cardiovascular, microvascular, and in worse cases, cognitive diseases (Law et al, 2016). Concerning heart disease, patients with T2DM are more susceptible than patients without T2DM. To elaborate, heart disease such as cardiovascular disease (CVD), heart failure (HF) and coronary heart disease (CHD), is known to be one of the most common and severe complications linked to diabetes (American Diabetes Association, 2020). 

Along with genetics, high saturated fat intake, low exercise levels, smoking, and down syndrome, diabetes mellitus is considered one of the more common risk factors causing CHD in patients (Gordon et al, 1982). Diabetes often coincides with high blood pressure, too much LDL cholesterol, and high triglycerides, which all contribute to the buildup of harmful fatty acids in the arteries and the hardening of arterial walls (CDC, 2020).


Similarly to risk prediction with genetic input, AI programs that look at CHD risk prediction in patients with T2DM can be useful for their ability to catch CHD diagnoses early on, allowing for higher chances of successful treatment (Kee et al., 2023).  

A variety of machine learning AI models have been created for CHD risk prediction programs. For example, a study in 2021 by Dr. Rui Fan, an expert on machine learning techniques and risk prediction, and her colleagues created a Random Forest (RF) program with various risk factor inputs (age, LDL cholesterol levels, the course of diabetes [time since diagnosis in years], heart rate in BPM, total cholesterol, diastolic pressure, course of hypertension, and blood platelet count) (Fan et al., 2021). To contrast this, a 2003 article by Dr. Aaron Folsom, a cardiovascular expert, and colleagues used a regression model to create their risk prediction model with risk factor inputs for age, race, total cholesterol levels, HDL cholesterol levels, smoking status, use of antihypertensives, and systolic blood pressure (Folsom, 2003). 

The contrast between these two studies is just one example of the difference in methods between various studies seeking to create a CHD risk prediction model for patients with T2DM; these studies utilize different machine learning techniques and have different risk factor inputs. 

This comparison provides insight into how ML and CHD risk estimation have been developed and evolved over time

Despite these differences, many studies detailing their machine learning program for risk prediction have some similarities. One such similarity is the use of AUC (Area under the ROC curve) calculations to calculate accuracy. This consistency across various studies can make it easier to compare AI programs’ accuracies with one another. For example, we can see in Table 1 (taken from a systematic review by Kee et al.) that neural network seems significantly more accurate than other machine learning techniques for predicting cardiovascular complications in risk prediction models focused on diabetes (Kee et al., 2023).

Table 1. Chart comparing the accuracy, sensitivity, specificity, precision, and area under the curve in risk prediction models focused on diabetes (Kee et al., 2023)

However, the prevalence of using AUC to quantify the accuracy of various machine learning programs may not produce the most accurate results. The population for studies like this tend to have unequal sizes per group – in this case, the group with diabetes and CHD is usually much smaller than the group with diabetes without CHD. The data may overrepresent or trend towards the larger group, which skews the AUC calculations as a result. Thus, AUC measurements may not be the best way to quantify the accuracy of these AI programs (Kee, 2023; Wang, 2022). To remedy this, Kee et al. proposed the use of Synthetic Minority Oversampling Technique, or SMOTE to reduce the imbalance between the population sizes of the two groups (Kee et al., 2023). Another possible solution, proposed by Wang et al., would be to publish the code and framework for each AI model that’s produced. That way, the accuracy of the programs can be validated externally and further improved for future use. 



This review aimed to look at the literature that used machine learning programs to predict risk of CHD based on specific risk factors (we chose to study diet/exercise, genetic information, and T2DM due to their high correlation with CHD. In other words, it was found that these risk factors were particularly predictive of CHD (Huang, 2022)(CDC, 2020). Being a condition with a particularly high mortality rate, cases involving CHD require the most reliable forms of detection and treatment which, in present time, is found in artificial intelligence and machine learning. There are various types of machine learning that have been proven capable of detecting risk in certain patients such as logistic regression, support vector machines, random forest, XGboost, and many more. With machine learning, in its many forms, artificial intelligence is able to provide more consistent and reliable assessments of subjects due to its detection of patterns within given data (Wang, 2021). However, while AI shows promise for practical uses in the medical field, a few things must be addressed first. Due to the often unbalanced nature of the data sets used as input for these AI programs, methods to either balance the data sets to prevent bias and skewed results or the use of a different accuracy indicator (other than AUC) should be implemented. Furthermore, publishing the code for the AI risk predictors can allow for external validation and further improvement, which may result in the AI program being suitable for practical use. As of now, however, most of these AI programs are likely not accurate or consistent enough for practical use in the medical field (Wang et al., 2022). Future studies should look to implement a wider diversity of people; as a result of the focus on anglo-centric studies, other populations and studies in other languages with useful insight have been excluded (Kee et al., 2023).


Ahmadi, M. and Lanphear, B. (2022). The impact of clinical and population strategies on coronary heart disease mortality: an assessment of Rose’s big idea. BMC Public Health, 22(1). doi:

Amin, M.N., Siddiqui, S.A., Ibrahim, M., Hakim, M.L., Ahammed, Md.S., Kabir, A. and Sultana, F. (2020). Inflammatory cytokines in the pathogenesis of cardiovascular disease and cancer. SAGE Open Medicine, 8(33194199), p.205031212096575. doi: (n.d.). Artificial Intelligence vs. Machine Learning | Microsoft Azure. [online] Available at: 

Benke, K., Agg, B., Silvester, B., Tarr, F., Nagy, Z., Polos, M., Daroczi, L., Merkely, B. and Szabolcs, Z. (2013). The Role of Transforming Growth Factor-Beta in Marfan Syndrome. [online] Cardiology journal. Available at:

Carbone, S., Canada, J.M., Billingsley, H.E., Siddiqui, M.S., Elagizi, A. and Lavie, C.J. (2019). Obesity paradox in cardiovascular disease: where do we stand? Vascular Health and Risk Management, [online] Volume 15(15), pp.89–100. doi:

CDC (2020). Diabetes and Your Heart. [online] Centers for Disease Control and Prevention. Available at:

Chen, B., Ruan, L., Yang, L., Zhang, Y., Lu, Y., Sang, Y., Jin, X., Bai, Y., Zhang, C. and Li, T. (2022). Machine learning improves risk stratification of coronary heart disease and stroke. Annals of Translational Medicine, 10(21), pp.1156–1156. doi:

Diabetes Atas (2019). IDF Diabetes Atlas. [online] Diabetes Atlas. Available at: [Accessed 26 Aug. 2023].

Fan, R., Zhang, N., Yang, L., Ke, J., Zhao, D. and Cui, Q. (2020). AI-based prediction for the risk of coronary heart disease among patients with type 2 diabetes mellitus. Scientific Reports. [online] doi:

Folsom, A.R., Chambless, L.E., Duncan, B.B., Gilbert, A.C. and Pankow, J.S. (2003). Prediction of Coronary Heart Disease in Middle-Aged Adults With Diabetes. Diabetes Care, 26(10), pp.2777–2784. doi:

Forrest, I.S., Petrazzini, B.O., Duffy, Á., Park, J.K., Marquez-Luna, C., Jordan, D.M., Rocheleau, G., Cho, J.H., Rosenson, R.S., Narula, J., Nadkarni, G.N. and Do, R. (2023). Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. Lancet (London, England), [online] 401(10372), pp.215–225. doi:

Gonsalves, A.H., Thabtah, F., Mohammad, R.M.A. and Singh, G. (2019). Prediction of Coronary Heart Disease using Machine Learning. Proceedings of the 2019 3rd International Conference on Deep Learning Technologies – ICDLT 2019. doi: .

Guo, H., Li, J., Liu, H. and He, J. (2022). Learning dynamic treatment strategies for coronary heart diseases by artificial intelligence: real-world data-driven study. BMC Medical Informatics and Decision Making, 22(1). doi:

Hajar, R. (2019). PCSK 9 inhibitors: A short history and a new era of lipid-lowering therapy. Heart Views, 20(2), p.74. doi:

PHaq, I.U., Chhatwal, K., Sanaka, K. and Xu, B. (2022). Artificial Intelligence in Cardiovascular Medicine: Current Insights and Future Prospects. Vascular Health and Risk Management, [online] 18, pp.517–528. doi:

Hossain, M.E., Uddin, S. and Khan, A. (2021). Network analytics and machine learning for predictive risk modeling of cardiovascular disease in patients with type 2 diabetes. Expert Systems with Applications, 164, p.113918. doi:

Hunziker, E.B. (2018). Elongation of the Long Bones in Humans by the Growth Plates. Nestle Nutrition Institute Workshop Series, [online] 89(29991028), pp.13–23. doi:

Jiang B, Guo N, Ge Y, Zhang L, Oudkerk M, Xie X. (2020). Development and application of artificial intelligence in cardiac imaging. Br J Radiol. 93(1113):20190812. doi: 10.1259/bjr.20190812. Epub 2020 Feb 6. PMID: 32017605; PMCID: PMC7465846.

Karatzia L, Aung N, Aksentijevic D. (2022). Artificial intelligence in cardiology: Hope for the future and power for the present. Front Cardiovasc Med. 9:945726. doi: 10.3389/fcvm.2022.945726. PMID: 36312266; PMCID: PMC9608631.

Keshel, T.E. (2015). Exercise Training and Insulin Resistance: A Current Review. Journal of Obesity & Weight Loss Therapy, s5(26523243). doi:

Law, T.D., Clark, L.A. and Clark, B.C. (2016). Resistance Exercise to Prevent and Manage Sarcopenia and Dynapenia. Annual Review of Gerontology and Geriatrics, 36(1), pp.205–228. doi:

Lazea, C., Bucerzan, S., Crisan, M., Al-Khzouz, C., Miclea, D., Şufană, C., Cismaru, G. and Grigorescu-Sido, P. (2021). Cardiovascular manifestations in Marfan syndrome. Medicine and Pharmacy Reports, [online] 94(Suppl No 1). doi:

Libbrecht, Maxwell W., and William Stafford Noble. (2015). “Machine Learning Applications in Genetics and Genomics.” Nature Reviews Genetics. 16(6),, pp. 321–332,,

Li, T.Y., Rana, J.S., Manson, J.E., Willett, W.C., Stampfer, M.J., Colditz, G.A., Rexrode, K.M. and Hu, F.B. (2006). Obesity as Compared With Physical Activity in Predicting Risk of Coronary Heart Disease in Women. Circulation, 113(4), pp.499–506. doi:

Linton, M.F., Yancey, P.G., Davies, S.S., W. Gray Jerome, Linton, E.F., Song, W.L., Doran, A.C. and Vickers, K.C. (2019b). The Role of Lipids and Lipoproteins in Atherosclerosis. [online] Available at: 

Masud, R. and Baqai, H.Z. (2017). The communal relation of MTHFR, MTR, ACE gene polymorphisms and hyperhomocysteinemia as conceivable risk of coronary artery disease. Applied Physiology, Nutrition, and Metabolism = Physiologie Appliquee, Nutrition Et Metabolisme, [online] 42(10), pp.1009–1014. doi:

NHS (2019). Coronary heart disease. [online] NHS. Available at:

Kee O., Harun, H., Mustafa, N., Murad, A., Chin, S.-F., Jaafar, R. and Abdullah, N. (2023). Cardiovascular complications in a diabetes prediction model using machine learning: a systematic review. Cardiovascular Diabetology, 22(1). doi:

Purcell, S.A., Mackenzie, M., Barbosa-Silva, T.G., Dionne, I.J., Ghosh, S., Siervo, M., Ye, M. and Prado, C.M. (2021). Prevalence of Sarcopenic Obesity Using Different Definitions and the Relationship With Strength and Physical Performance in the Canadian Longitudinal Study of Aging. Frontiers in Physiology, 11. doi:

Quazi, S. (2022). Artificial intelligence and machine learning in precision and genomic medicine. Medical Oncology, 39(8). doi:

Reaven, G. (2012). Insulin Resistance and Coronary Heart Disease in Nondiabetic Individuals. Arteriosclerosis, Thrombosis, and Vascular Biology, [online] 32(8), pp.1754–1759. doi:

Reel, P.S., Reel, S., Pearson, E., Trucco, E. and Jefferson, E. (2021). Using machine learning approaches for multi-omics data analysis: A review. Biotechnology Advances, 49, p.107739. doi: 

‌Rizk, N.M., El-Menyar, A., Egue, H., Souleman Wais, I., Mohamed Baluli, H., Alali, K., Farag, F., Younes, N. and Al Suwaidi, J. (2015). The Association between Serum LDL Cholesterol and Genetic Variation in Chromosomal Locus 1p13.3 among Coronary Artery Disease Patients. BioMed Research International, [online] 2015, p.678924. doi:

Roberts, J.D., Lang, P., Bigatello, L.M., Vlahakes, G.J. and Zapol, W.M. (1993). Inhaled nitric oxide in congenital heart disease. Circulation, 87(2), pp.447–453. doi:

Robinson, P.N., Arteaga-Solis, E., Baldock, C., Collod-Beroud, G., Booms, P., De Paepe, A., Dietz, H.C., Guo, G., Handford, P.A., Judge, D.P., Kielty, C.M., Loeys, B., Milewicz, D.M., Ney, A., Ramirez, F., Reinhardt, D.P., Tiedemann, K., Whiteman, P. and Godfrey, M. (2006). The molecular genetics of Marfan syndrome and related disorders. Journal of Medical Genetics, [online] 43(10), pp.769–787. doi:

Shah, M., Revis, D., Herrick, S., Baillie, R., Thorgeirson, S., Ferguson, M. and Roberts, A. (1999). Role of Elevated Plasma Transforming Growth Factor-β1 Levels in Wound Healing. The American Journal of Pathology, [online] 154(4), pp.1115–1124. doi:

Sitinjak, B.D.P., Murdaya, N., Rachman, T.A., Zakiyah, N. and Barliana, M.I. (2023). The Potential of Single Nucleotide Polymorphisms (SNPs) as Biomarkers and Their Association with the Increased Risk of Coronary Heart Disease: A Systematic Review. Vascular Health and Risk Management, [online] 19, pp.289–301. doi:

Song, Y.-Y. and Lu, Y. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, [online] 27(2), pp.130–5. doi:

Stuart, A.G. and Williams, A. (2007). Marfan’s syndrome and the heart. Archives of Disease in Childhood, [online] 92(4), pp.351–356. doi:

Szczepańska, E., Białek-Dratwa, A., Janota, B. and Kowalski, O. (2022). Dietary Therapy in Prevention of Cardiovascular Disease (CVD)—Tradition or Modernity? A Review of the Latest Approaches to Nutrition in CVD. Nutrients, [online] 14(13), p.2649. doi:

Therakomen, V., Petchlorlian, A. and Lakananurak, N. (2020). Prevalence and risk factors of primary sarcopenia in community-dwelling outpatient elderly: a cross-sectional study. Scientific Reports, [online] 10(1), p.19551. doi:

Tian, D. and Meng, J. (2019). Exercise for Prevention and Relief of Cardiovascular Disease: Prognoses, Mechanisms, and Approaches. Oxidative Medicine and Cellular Longevity, [online] 2019(3756750), pp.1–11. doi:

Ueda, P., Gulayin, P. and Danaei, G. (2018). Long-term moderately elevated LDL-cholesterol and blood pressure and risk of coronary heart disease. PLOS ONE, [online] 13(7), p.e0200017. doi:

Vilne B, Ķibilds J, Siksna I, Lazda I, Valciņa O, Krūmiņa A. (2019).  Could Artificial Intelligence/Machine Learning and Inclusion of Diet-Gut Microbiome Interactions Improve Disease Risk Prediction? Case Study: Coronary Artery Disease. Front Microbiol. 2022 Apr 11;13:627892. doi: 10.3389/fmicb.2022.627892. PMID: 35479632; PMCID: PMC9036178.

Wang, K., Tian, J., Zheng, C., Yang, H., Ren, J., Liu, Y., Han, Q. and Zhang, Y. (2021). Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Computers in Biology and Medicine, 137, p.104813. doi: 

Wang, M., Francis, F., Kunz, H., Zhang, X., Wan, C., Liu, Y., Taylor, P., Wild, S.H. and Wu, H. (2022). Artificial intelligence models for predicting cardiovascular diseases in people with type 2 diabetes: A systematic review. Intelligence-Based Medicine, [online] 6, p.100072. doi:

West, Darrell, and John Allen. “How Artificial Intelligence Is Transforming the World.” Brookings, 24 Apr. 2018,

WHO. “Cardiovascular Diseases.” World Health Organization, World Health Organization, 11 June 2021,

Wilcox, G. (2005). Insulin and Insulin Resistance. The Clinical biochemist. Reviews, [online] 26(2), pp.19–39. Available at:

Yu, C.-S., Lin, Y.-J., Lin, C.-H., Wang, S.-T., Lin, S.-Y., Lin, S.H., Wu, J.L. and Chang, S.-S. (2020). Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study. JMIR medical informatics, [online] 8(3), p.e17110. doi:

Zilliox, L.A., Chadrasekaran, K., Kwan, J.Y. and Russell, J.W. (2016). Diabetes and Cognitive Impairment. Current Diabetes Reports, [online] 16(9). doi:

Huang, Y., Ren, Y., Yang, H., Ding, Y., Liu, Y., Yang, Y., Mao, A., Yang, T., Wang, Y., Xiao, F., He, Q. and Zhang, Y. (2022). Using a machine learning-based risk prediction model to analyze the coronary artery calcification score and predict coronary heart disease and risk assessment. Computers in Biology and Medicine, [online] 151, p.106297. doi:

American Diabetes Association (2019). 10. Cardiovascular Disease and Risk Management: Standards of Medical Care in Diabetes—2020. Diabetes Care, [online] 43(Supplement 1), pp.S111–S134. doi: