Artificial intelligence (AI) is revolutionizing medicine. A few years ago, models that analyzed health data were limited to predicting the probability of a specific disease based on individual variables. Today, algorithms inspired by AI Transformers The chatbots that bring chatbots to life are adapting to learn the natural course of many diseases at once. This breakthrough opens the door to a new approach to prevention: anticipating the onset of more than a thousand diagnoses based on each person's medical history, lifestyle, and social context. In this article, we explore the workings of the Delphi-2M model, its results and limitations, as well as its implications for personalized nutrition and public health.
Table of contents
ToggleWhat is a generative transformer and how does it apply to healthcare?
The Transformers are neural networks that, using an attention architecture, analyze data sequences to predict the next element in the sequence. Their best-known application is language models that generate coherent text. This ability to recognize temporal patterns is also useful for understanding how health evolves throughout life. Disease progression is influenced by a mosaic of genetic, environmental, and lifestyle factors, and is manifested in diagnoses that accumulate over time【559897564668818†L49-L69】. A transformer can represent this evolution as a sequence of events coded using the International Classification of Diseases (ICD‑10), using age-based time coding and extending the model to predict both the type of event and the interval until it occurs【559897564668818†L163-L176】.
Delphi‑2M is an adapted version of GPT that integrates these improvements. It replaces the classic positional coding with a continuous age-based coding, allows you to predict the exact moment of the next diagnosis using a model of exponential wait, and adds attention masks that prevent the algorithm from being confused when multiple events are recorded at the same age【559897564668818†L163-L176】. In addition, it incorporates tokens representing sex, body mass index (BMI), tobacco and alcohol consumption, which are used as input but not predicted【559897564668818†L148-L176】.
Delphi-2M: Massive Cohort Training and Performance
To train Delphi‑2M, the researchers used the health records of 402,799 UK Biobank participants and validated the model in 1.93 million people from Danish health records. The UK cohort was split into 80 % for training and 20 % for hyperparameter validation【559897564668818†L153-L160】. The model works with a vocabulary of 1,258 tokens including ICD‑10 top-level diagnoses, sex, BMI categories, smoking and alcohol use, as well as “disease-free” events that help calibrate the model to diagnosis-free time periods【559897564668818†L148-L176】. The final model had approximately 2.2 million parameters, 12 layers, and 12 attention heads, a size that was optimal according to the scaling laws observed in the study【559897564668818†L185-L195】.
The results demonstrate that the model accurately predicts the next diagnosis for a wide range of diseases. In internal validation data, the area under the curve (AUC) was around 0.76, and more than 97 diagnoses had an AUC greater than 0.5【559897564668818†L220-L232】. The model showed differences by ICD‑10 chapter: some diseases, such as asthma or osteoarthritis, have a narrow prediction margin, while others, such as sepsis or death, show wide interindividual variability【559897564668818†L220-L225】. For horizons up to ten years, the AUC remains close to 0.70, suggesting that it is also useful for long-term forecasts【559897564668818†L246-L248】.
Delphi-2M’s performance is comparable to or superior to many existing clinical prediction algorithms. The study found that the model matches or outperforms Framingham equations and other cardiovascular risk scores, approximates dementia models, and outperforms an algorithm using 67 biomarkers to predict various diagnoses【559897564668818†L252-L269】. Accuracy decreased slightly when applied to Danish data (mean AUC 0.67), but predictions between the two cohorts were highly correlated【559897564668818†L1218-L1230】, indicating that many of the learned patterns reflect the true evolution of multimorbidity.
Generation of health trajectories and estimation of disease burdens
A unique advantage of Delphi‑2M is its nature generativeUnlike conventional models, it can simulate future health trajectories based on current medical history. This allows for the estimation of rates of multiple diseases and the cumulative burden of disease over periods of up to 20 years. In the study, trajectories were generated for over 63,000 participants based on their medical history up to age 60; the predicted disease incidences between ages 70 and 75 were remarkably consistent with those observed in the real-world population.
The possibility of sample synthetic data This has important implications for research. It enables the creation of datasets that preserve statistical co-occurrences of diseases without exposing personal records, facilitating the development of other AI models with a lower risk of privacy violations. [559897564668818†L1470-L1479] From a public health perspective, these trajectories could help anticipate demand for healthcare services, optimize prevention campaigns, and adapt screening programs. For personalized nutrition, estimating the future likelihood of metabolic or cardiovascular disorders could guide early interventions in diet and lifestyle.
Biases and Limitations: Why We Should Be Cautious
Delphi-2M is not without limitations. The model learns from available data and reproduces its biases. The UK Biobank primarily includes participants aged 40–70; younger individuals and those who died before recruitment are underrepresented, creating an “immortal time” bias and underestimating early mortality. The UK cohort also includes a higher proportion of individuals with higher levels of education and income, leading to a “healthy volunteer bias” and a lower prevalence of certain diagnoses.
The source of the data influences the predictions. Primary care records contain mostly common diseases, while hospital data capture serious conditions such as myocardial infarction or sepsis. When a diagnosis only appears in one setting (e.g., a hospital), the model can infer that other hospital diagnoses increase its likelihood, even when this relationship is an artifact of the registration system. This effect can increase the sepsis rate eightfold among people who have had any other hospital-related event.
Furthermore, the model predicts different rates depending on ancestry and the deprivation index, and does not detect clear trends between lifestyle and year of birth【559897564668818†L1494-L1501】. These observations highlight the importance of having representative data and assessing the fairness of the algorithms. Delphi-2M also does not yet consider genomic, metabolomic, or wearable device information, which could improve the individualization of predictions【559897564668818†L1505-L1513】.
Applications in personalized nutrition and lifestyle
The ability to predict the progression of multiple pathologies offers new opportunities for precision nutrition. Knowing the likelihood of developing type 2 diabetes or cardiovascular disease in the next two decades allows for the design of personalized nutrition and physical activity plans. Tools such as Oorenji and our platform Mefood Omics They already use algorithms to recommend balanced menus based on medical history, microbiome, and eating habits. Integrating generative models such as Delphi-2M with caloo.app or with Alimentomics This would allow for anticipating nutritional needs and supplementing the diet with specific micronutrients. Furthermore, trajectory simulation could help identify optimal windows for intervention before symptoms appear.
It's important to emphasize that no prediction replaces medical advice or justifies hasty health decisions. AI can provide guidance, but only a professional can interpret risks in the context of each individual. Furthermore, factors such as stress, sleep, and socioeconomic environment influence health and must be addressed holistically. Platforms such as recipes.oorenji.com They offer healthy and educational recipes that can accompany any preventive plan.
Conclusion and call to action
Delphi-2M demonstrates that generative transformers can learn the natural history of diseases and predict risks at scale【559897564668818†L1453-L1470】. The model outperforms many traditional algorithms and allows for the simulation of health trajectories, but also highlights inherent biases in the data and the need for rigorous ethical evaluation【559897564668818†L1492-L1503】. For nutrition professionals, AI opens up a promising field: uniting risk prediction with personalized dietary interventions.
If you'd like to learn more about how new technologies can help you improve your health, visit our resources at Mefood Omics and explores solutions to OorenjiRemember that adopting healthy habits, maintaining a balanced diet, and exercising remain fundamental pillars of disease prevention. AI is a powerful ally, but your commitment is irreplaceable.
References
- Shmatko A. et al. Learning the natural history of human disease with generative transformers – Nature
- Conroy G. Which diseases will you have in 20 years? This AI accurately predicts your risks – Nature News
- Gregory A. New AI tool can predict a person's risk of more than 1,000 diseases – The Guardian
