Regarding the quality of the outputs, for all the 63 outputs considered in this study, the clarity, coherence and practicality were rated between 3 and 4, indicating Good to Excellent; with none of the output rated as Poor or Fair.
This study showed that the AI-generated outputs for the dietary management of T2DM and the MetS were often incomplete or discordant with the NCM recommendations, presenting major gaps of significant implications in patient care and highlighting the need to further improve the accuracy of these output. Similar observations have been reported by previous studies in non-nutrition related medical fields. For instance, Chen et al. (2023) evaluated a large language models chatbot's therapeutic recommendations for breast, prostate, and lung cancer, and their concordance with the National Comprehensive Cancer Network (NCCN) guidelines. The authors found that the chatbot's recommendations were, at least partially, non-aligned with the NCCN guidelines and the output included "hallucinated" responses [19]. Similarly, in a recent study where endocrinologists asked questions to ChatGPT focusing on assessment and treatment options for obesity in T2DM, ChatGPT output was found to be compatible with the guidelines for obesity assessment while its compatibility was lower in sections pertinent to nutritional, medical, and surgical therapeutic strategies [12]. In the nutrition field, Qarajeh et al. (2023) used four different AI models to investigate the efficacy of different AI models in correctly identifying the potassium and phosphorus content of foods for chronic kidney disease patients. The AI-derived results were compared to the Mayo Clinic Renal Diet Handbook's recommendations [23]. While being of promising potential, the results suggested variations among the different AI models' outputs and a diversity in their range of accuracy, thus emphasizing the need for human oversight when using AI in the field of nutritional education and recommendations [23]. Furthermore, Aiumtrakul et al. (2024) examined the reliability of chatbots in classifying foods according to their oxalate content [24]. The study findings highlighted significant variations in the accuracy of AI-models in classifying dietary oxalate content and emphasized the need for further improvements in chatbot algorithms for therapeutic accuracy [24].
Regarding dietary interventions, the observed gaps in our study were relatively similar when using the "dietary management" or the "NCP" domains. For example, weight loss recommendations along with guidance on achieving energy deficit were generally missing or incomplete in the AI-generated outputs. More specifically, the AI-derived dietary management outputs for obesity, MetS, and hypertriglyceridemia levels did not address energy intake modification, and for low HDL-levels, an important gap was related to weight loss recommendations. This is concerning given the well-established pathophysiologic links between excess weight, insulin resistance, MetS and T2DM [43, 44]. Indeed, weight loss is one of the most impactful therapeutic interventions for cardiometabolic conditions. The 2023 Standards of Care in Diabetes indicate that relatively modest weight loss (3-7% of baseline body weight), significantly improves glycemic control and cardiometabolic status, while larger losses (exceeding 10% body weight) possibly lead to disease remission [8]. Effective strategies for weight loss involve the establishment of an energy deficit (500-1000 kcal/day), via hypocaloric diets and increased physical activity [8, 45]. Notably, AI-derived outputs were lacking appropriate physical activity recommendations for all the cardiometabolic conditions studied in this paper, except for T2DM. Hence, despite ChatGPT's ability to generate 'contextually relevant responses' due to its extensive training on large datasets, it still encounters challenges in finding interdisciplinary connections and providing holistic lifestyle interventions that experienced dietitians often provide as part of patient counseling.
Dietary intervention information generated via ChatGPT were also often incomplete in terms of guidance on specific nutrients of concern such as sodium intake in the case of HTN, the amount, distribution and quality of CHO for T2DM, and the intake of SFA and added sugar for obesity [7, 8]. Notably, information pertinent to CHO counting were not included in the AI-derived dietary management output although CHO counting is a cornerstone of therapeutic diets in cases of T2DM [46]. In fact, CHO counting, even in its most basic form, has been consistently linked to better glycemic control among patients with T2DM and hence better health outcomes [47, 48]. Furthermore, recommendations related to increasing the intakes of other nutrients known to improve cardiometabolic health was also incomplete/missing such as those pertinent to dietary fiber, [7, 8]. For example, the AI-derived output on dietary intervention within the NCP did not address the need to increase fiber and soluble/viscous fiber or to consume whole grain products for all the considered conditions. The promotion of dietary fiber intake is a crucial component of the management of cardiometabolic abnormalities since, through its colonic and hormonal effects, adequate intake of dietary fiber can increase insulin sensitivity, enhance fat oxidation, and decrease cardiometabolic risk [49]. Similarly, recommendations pertinent to omega-3 fatty acids from fatty fish were missing or incomplete for most of the studied cardiometabolic health conditions, despite their inclusion in the NCM and in clinical recommendations issued by scientific bodies [7, 8]. The AI-derived output was also lacking recommendations addressing dietary intake as a whole, in the form of dietary patterns. This is in contrast to the NCM recommendations as well as recent recommendations published by the ADA [8] and the AHA [50]. For instance, the ADA made reference to specific dietary interventions such as the Mediterranean or the DASH diet as potential dietary patterns that can be adopted by T2DM patients to foster weight loss and improve cardiometabolic profile [8]. A scientific statement from the AHA also showed that several dietary patterns strongly align with the 2021 AHA Dietary Guidance [50]. Interventions based on dietary patterns are gaining increasing popularity as the guidance provided by this approach is often clearer and easier to follow by the patient as compared to recommendations based on nutrients or individual foods [51].
In the present study, AI-generated outputs for nutritional assessment within the NCP were incomplete for most of the considered conditions. More specifically, the AI-derived output was lacking in terms of dietary assessment, such as the intakes of energy, protein and specific types of fat (in the case of high TG), alcohol intake (for the MetS) and use of dietary supplements (for low HDL). Anthropometric assessment in terms of indicators and cutoffs were also missing from the NCP for T2DM, MetS, hypertriglyceridemia and the monitoring/evaluation component for obesity. Acknowledging the crucial role that nutritional assessment plays in identifying nutrition-related problems and their causes, as well as developing tailored dietary interventions, the observed gaps in AI-derived outputs call for caution in relying solely on ChatGPT for the assessment of patient's nutritional status and the development of nutritional care plans [52, 53].
There were also gaps in the AI-derived Nutrition diagnosis outputs, with the PES statements not being mentioned or having missing diagnostic terminology. Accurate nutrition diagnosis and clearly stated PES statements represent a crucial component of the patient care plan, allowing for the determination of subsequent steps within the NCP and the development of tailored nutrition interventions [54, 55]. More specifically, accurate nutrition diagnosis allows to identify and prioritize problems that can most likely be resolved or improved by a nutrition intervention, evaluate if the "root cause" can be addressed by the intervention and select the specific signs and symptoms from the assessment data to allow for the monitoring and tracking of whether the problem was resolved or improved [56]. Quality improvement literature shows that incomplete nutrition diagnoses and the lack of standardized approaches in the identification of nutrition problems increase the variation and unpredictability of therapeutic outcomes and hence the effectiveness of patient care [56]. The AI-derived output had also significant gaps in the monitoring and evaluation step, a crucial step within the NCP that assesses patients' progress and serves as a basis for adjustments at the level of previous diagnoses or nutrition interventions [57].
In this study, ChatGPT was prompted to provide a 1500 kcal 1-day menu for different diets (conditions). In alignment with dietary recommendations, and as observed by previous studies [58], the generated menus contained non-starchy vegetables, lean protein foods, and CHO foods, but upon nutritional analysis, these menus showed that the energy amounts sometimes diverged from the assigned 1500 kcal (e.g., by + 361 Kcal in the case of hypertriglyceridemia), and the amounts of CHO and fat were often discordant with the dietary recommendations. The observed low amount of CHO in some of the menus is in line with findings reported by Chatelan, et al., 2023 [58] where a ChatGPT-derived 1-day diet menu for T2DM was described "to be inspired by ketogenic diets". In addition, in our study, the dietary fat amounts were high, reaching 43% of energy intake in the menu for low HDL management and 40% for T2DM. Such high levels of fat intake may potentially exacerbate cardiometabolic abnormalities including abdominal obesity, blood pressure and dyslipidemia [59]. As for micronutrients, certain discrepancies were noted between AI-derived outputs and nutritional recommendations particularly for calcium and vitamin D intakes, whereby the ChatGPT-generated menus included very low levels of vitamin D and inadequate calcium intake. This is concerning, given the high prevalence of micronutrient deficiencies in the general population worldwide, and among patients with T2DM and the MetS [60].
With regards to clarity/quality of the AI-derived outputs, the findings of this study showed that the outputs of ChatGPT were of good to excellent clarity, coherence and practicality. In fact, by design, ChatGPT has been trained on extensive human language datasets, and several studies confirmed its ability to produce clear and coherent text output [61,62,63].
The present study suggests few recommendations to be considered when using ChatGPT for the dietary management of T2DM and the MetS. Although ChatGPT is an increasingly available resource, enhancing accessibility to nutrition information online, evaluating the comprehensiveness and accuracy of the responses remains crucial for AI chatbot users [58]. In our study, although there were minor disagreements between the dietitians' scores, these scores were not always identical, highlighting the potential challenges of interpreting the AI-generated therapeutic strategies. The fact that ChatGPT-derived responses lack references further limits the possibility to ascertain the credibility of the information [58]. Investigations evaluating the gaps of ChatGPT, such as the current study, are crucial to raise awareness among dietitians regarding AI-generated information and to caution healthcare providers from relying solely on ChatGPT for dietetic advice [58]. Dietitians and health care practitioners also have the responsibility of educating their clients/patients about the risks of the use of this technology for disease treatment or prevention [64] as the use of AI-chatbots could propagate misinformation due to its widespread use for self-education and self-management by patients and the general public [19]. In an opinion paper, Arslan (2024) noted that although chatbots can be harnessed for the provision of real-time nutritional information, a cautious and critical approach should be adopted in its utilization [22]. A real-world example brought forward by Arslan is an application where ChatGPT was integrated within online nutritional counseling platforms as an initial advisory tool [25], allowing users to obtain initial guidance and foundational knowledge prior to consulting with human experts (i.e., dietitians). There are also other opportunities for ChatGPT to assist nutritionists and dietitians. For instance, AI models can assist in providing a quick, 24/7 second opinion, when utilized advisedly and when the dietitian is able to correctly define an issue, ask pertinent questions, refine the prompts and gauge the accuracy of the output. It can also help in brainstorming certain ideas (such as nutrition education objectives, research hypotheses, etc.), providing quick summaries of texts, and drafting texts that can be easily understood by the patients and hence can be used for effective nutritional counseling [58]. In addition, given that ChatGPT and other similar AI systems are trained using large sets of training data and typically provide responses based on the data they were trained on, it is recommended to add more evidence-based nutritional knowledge and recommendations to the training data of AI and to better synchronize AI algorithms with nutrition guidelines [22].
The findings of this study should be interpreted in light of the following limitations. First, AI-generated answers to the same prompt may vary over time due to the regular optimizations of ChatGPT [12, 58, 65], making it difficult to know which exact information a health professional/patient would receive, especially if additional data are provided related to sex, age, food preferences, and medical history [58]. The continuously changing nature of the ChatGPT presents a challenge in assessing the reliability of the information, especially in studies that adopt a zero-shot approach [12, 66]. Second, differences in prompt engineering like wording, design and implementation could generate different results. In this study, only 3 versions of the question were designed. Increasing the variety and complexity of the prompt could provide a more comprehensive evaluation of the ChatGPT performance [67]. Third, including a human comparison group could have provided a more contextual appraisal of the ChatGPT. That said, this study was based on the premise that dietitians, especially registered dietitians, as well as the ChatGPT are expected to strictly adhere to the NCP guidelines in the management of diseases. Finally, in this study, ChatGPT outputs were evaluated in a controlled setting. Future studies can further assess ChatGPT's performance in real-world clinical scenarios and actual patients' cases.
In conclusion, despite their clarity, the ChatGPT-derived outputs in the field of T2DM and the MetS nutritional management presented several gaps, including weight loss recommendations, energy deficit, anthropometric assessment, specific nutrients of concern, the adoption of specific dietary interventions, physical activity recommendations, diagnostic documentation statements, and monitoring and evaluation. In the 1500 kcal one-day menus, the amounts of CHO, fat, vitamin D and calcium were discordant with dietary recommendations. These findings suggest that ChatGPT, and potentially other future AI chatbots, react to the user's prompts in "a human-like" way, but cannot replace the dietitians' expertise and critical judgment. While healthcare practitioners may consult this increasingly available technology for various purposes, they must also be cautious about relying solely on AI chatbots in clinical practice and should collectively raise awareness about associated risks.