Generative Artificial Intelligence: Capabilities of Llama-3.1-70B-Instruct in Biochemistry and Metabolism
Keywords:
ChatGPT, Medical Education, Biochemistry, Metabolism, Llama3Abstract
Introduction: Recent studies have shown the potential of large language models in medical education. However, there is limited information regarding their capabilities in the subjects of Biochemistry, Metabolism, and Nutrition, especially in open-source models or those other than ChatGPT.
Objective: To evaluate the capabilities of the open-source large language model Llama 3.1 70B-Instruct in the subjects of Biochemistry, Metabolism, and Nutrition.
Material and Methods: An Observational study with an exploratory, mixed-methods design was conducted using two groups of evaluators, one of which was external to the research. The researchers assessed 264 questions, while the external evaluators examined a stratified random sample of 72 questions by topic. A 5-point Likert scale was used for evaluation, with RStudio for statistical analysis and Zotero for reference management.
Results: Both evaluator groups reported similar results, with a consensus rating of 4.75 for the tool. Its highest performance was observed in Metabolism and Nutrition, scoring 4.8, while in Biochemistry it scored 4.7. Some areas demonstrated shortcomings. The explanations provided by the tool were clear and useful, showing a strong ability to explain abstract concepts.
Conclusions: The results were favorable. However, further comprehensive studies of large language models in these and other subjects of medical education are essential. Only through continued research we can better guide students on optimal use and benefits.
Downloads
References
1- Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need [Internet]. Nueva York: arXiv; 2017 [Citado 09/07/2024]. Disponible en: https://arxiv.org/abs/1706.03762
2- Uszkoreit J. Transformer A Novel Neural Network Architecture for Language Understanding 2017 [Internet]. Italia: APICe, 2023 [Citado 10/07/2024]. Disponible en: https://research.google/blog/transformer-a-novel-neural-network-architecture-for-language-understanding/
3- Stryker C, Kavlakoglu E. What is Generative AI? [Internet]. Nueva York: IBM; 2024 [Citado 10/07/2024]. Disponible en: https://www.ibm.com/topics/artificial-intelligence
4- Introduction to prompting | Generative AI on Vertex AI [Internet]. EE UU: Google Cloud; 2023 [Citado 16/07/2024]. Disponible en: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/introduction-prompt-design
5-Introducing ChatGPT [Internet]. California: OpenAI; 2023 [Citado 16/07/2024]. Disponible en: https://openai.com/index/chatgpt/
6- IBM. What are Large Language Models (LLMs)? [Internet]. Nueva York: IBM; 2023 [Citado 16/07/2024]. Disponible en: https://www.ibm.com/topics/large-language-models
7- Introducing OpenAI o1. [Internet]. California: OpenAI; 2023 [Citado 26/09/2024]. Disponible en: https://openai.com/index/introducing-openai-o1-preview/
8- Artsi Y, Sorin V, Konen E, Glicksberg BS, Nadkarni G, Klang E. Large language models for generating medical examinations: systematic review. BMC Med Educ [Internet]. 2024 [Citado 5/08/2024];24(1):354. Disponible en: https://doi.org/10.1186/s12909-024-05239-y
9- Li SW, Kemp MW, Logan SJS, Dimri PS, Singh N, Mattar CNZ, et al. ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. American Journal of Obstetrics and Gynecology [Internet]. 2023 [Citado 7/08/2024]; 229(2):172.e1-172.e12. Disponible en: https://www.sciencedirect.com/science/article/pii/S000293782300251X
10- Liang W, Zhang Y, Cao H, Wang B, Ding D, Yang X, et al. Can large language models provide useful feedback on research papers? A large-scale empirical analysis [Internet]. Nueva York: arXiv; 2023 [Citado 13/08/2024]. Disponible en: http://arxiv.org/abs/2310.01783
11- Balcı Ö. The Role of ChatGPT in English as a Foreign Language (EFL) Learning and Teaching: A Systematic Review. International Journal of Current Educational Studies [Internet]. 2024 [Citado 21/08/2024]; 3(1): 10-15. Disponible en: https://www.ijces.net/index.php/ijces/article/view/107
12- Meo SA, Al-Khlaiwi T, AbuKhalaf AA, Meo AS, Klonoff DC. The Scientific Knowledge of Bard and ChatGPT in Endocrinology, Diabetes, and Diabetes Technology: Multiple-Choice Questions Examination-Based Performance. J Diabetes Sci Technol. [Internet]. 2023 [Citado 21/08/2024]:19322968231203987. Disponible en: https://doi.org/10.1177/19322968231203987
13- Soulage CO, Van Coppenolle F, Guebre-Egziabher F. The conversational AI “ChatGPT” outperforms medical students on a physiology university examination. Advances in Physiology Education [Internet]. 2024 [Citado 22/08/2024];48(4):677-84. Disponible en: https://journals.physiology.org/doi/full/10.1152/advan.00181.2023
14- Kaftan AN, Hussain MK, Naser FH. Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study. Sci Rep [Internet]. 2024 [Citado 22/08/2024];14(1):8233. Disponible en: https://www.nature.com/articles/s41598-024-58964-1
15- Dhanvijay AKD, Pinjar MJ, Dhokane N, Sorte SR, Kumari A, Mondal H. Performance of Large Language Models (ChatGPT, Bing Search, and Google Bard) in Solving Case Vignettes in Physiology. Cureus [Internet]. 2023 [Citado 22/08/2024];15(8):e42972. Disponible en: https://pmc.ncbi.nlm.nih.gov/articles/PMC10475852/
16- Meo SA, Al-Masri AA, Alotaibi M, Meo MZS, Meo MOS. ChatGPT Knowledge Evaluation in Basic and Clinical Medical Sciences: Multiple Choice Question Examination-Based Performance. Healthcare [Internet]. 2023 [Citado 23/08/2024];11(14):2046. Disponible en: https://pubmed.ncbi.nlm.nih.gov/37510487/
17- IBM Data IA Team. What are Open Source Large Language Models? [Internet]. Nueva York: IBM; 2023 [Citado 9/09/2024]. Disponible en: https://www.ibm.com/think/topics/open-source-llms
18- Consecuencias del bloqueo de EE.UU. para las comunicaciones y la informática en Cuba. Trabajadores [Internet]. 31 oct 2020 [Citado 9/09/2024]. Disponible en: https://www.trabajadores.cu/20201031/consecuencias-del-bloqueo-de-ee-uu-para-las-comunicaciones-y-la-informatica-en-cuba/
19- Antón S. Bloqueo estadounidense es el principal impedimento para un más amplio acceso a internet en Cuba. Granma [Internet]. 26 may 2021 [Citado 12/09/2024]. Disponible en: https://www.granma.cu/cuba/2021-05-26/bloqueo-estadounidense-es-el-principal-impedimento-para-un-mas-amplio-acceso-a-internet-26-05-2021-17-05-40
20- Meta-llama/Llama-3.1-70B-Instruct · Hugging Face 2024 [Internet]. Nueva York: Hugging Face; 2023 [Citado 22/09/2024]. Disponible en: https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct
21- HuggingChat [Internet]. Nueva York: Hugging Face; 2023 [Citado 29 de septiembre de 2024]. Disponible en: https://huggingface.co/chat
22- Arias FG. El Proyecto de Investigación Introducción a la Metodología Científica. 6 ed. Caracas-República Bolivariana de Venezuela: Episteme; 2012. 137 p.
23- Shi F, Chen X, Misra K, Scales N, Dohan D, Chi EH, et al. Large Language Models Can Be Easily Distracted by Irrelevant Context. PMLR [Internet]. 2023 [Citado 29/09/2024]; 202:31210-27. Disponible en: https://proceedings.mlr.press/v202/shi23a.html
24- Likert R. A technique for the measurement of attitudes. Archives of Psychology [Internet]. 1932 [Citado 10/10/2024];22 (140):55-55 Disponible en: https://legacy.voteview.com/pdf/Likert_1932.pdf
25- Matas A. Diseño del formato de escalas tipo Likert: un estado de la cuestión. Revista electrónica de investigación educativa [Internet]. 2018 [Citado 11/10/2024];20(1):38-47. Disponible en: https://redie.uabc.mx/redie/article/view/1347
26- Pérez Pérez CJ. Archivos de Investigación Llama-3.1-70B-Instruct Bioquímica Metabolismo y Nutrición [Internet]. Ginebra: Zenodo; 2025 [Citado 03/01/2025]. Disponible en: https://zenodo.org/records/14596114
27- Ghosh A, Bir A, Ghosh A, Bir A. Evaluating ChatGPT’s Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry [Internet]. 2023 [Citado 19/10/2024]; 15(4): e37023. Disponible en: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152308/
28- Ghosh A, Jindal NM, Gupta VK, Bansal E, Bajwa NK, Sett A. Is ChatGPT’s Knowledge and Interpretative Ability Comparable to First Professional MBBS (Bachelor of Medicine, Bachelor of Surgery) Students of India in Taking a Medical Biochemistry Examination?. Cureus [Internet]. 2023 [Citado 19/10/2024];15(10):e47329. Disponible en: https://pmc.ncbi.nlm.nih.gov/articles/PMC10657167/
29-Sallam M, Al-Salahat K, Eid H, Egger J, Puladi B. Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions. Advances in Medical Education and Practice [Internet]. 2024 [Citado 19/10/2024];15:857-71. Disponible en: https://www.tandfonline.com/doi/abs/10.2147/AMEP.S479801
30- Wang Z, Bukharin A, Delalleau O, Egert D, Shen G, Zeng J, et al. HelpSteer2-Preference: Complementing Ratings with Preferences [Internet]. Nueva York: arXiv; 2024 [Citado 19/10/2024]. Disponible en: http://arxiv.org/abs/2410.01257
31- Nvidia/Llama-3.1-Nemotron-70B-Instruct-HF [Internet]. Nueva York: Hugging Face; 2024 [Citado 24/10/2024]. Disponible en: https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
32- GPT-4o mini: advancing cost-efficient intelligence [Internet]. California: OpenAI; 2024 [Citado 24/10/2024]. Disponible en: https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
33- Gravel J, D’Amours-Gravel M, Osmanlliu E. Learning to Fake It: Limited Responses and Fabricated References Provided by ChatGPT for Medical Questions. Mayo Clinic Proceedings: Digital Health [Internet]. 2023 [Citado 26/10/2024];1(3):226-34. Disponible en: https://www.sciencedirect.com/science/article/pii/S2949761223000366
34- Caruccio L, Cirillo S, Polese G, Solimando G, Sundaramurthy S, Tortora G. Can ChatGPT provide intelligent diagnoses? A comparative study between predictive models and ChatGPT to define a new medical diagnostic bot. Expert Systems with Applications [Internet]. 2024 [Citado 27/10/2024];235:121186. Disponible en: https://www.sciencedirect.com/science/article/pii/S0957417423016883
35- Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large Language Models are Zero-Shot Reasoners [Internet]. Nueva York: arXiv; 2023 [Citado 28/10/24]. Disponible en: http://arxiv.org/abs/2205.11916
36- Zhang J, Qiao D, Yang M, Wei Q. Regurgitative Training: The Value of Real Data in Training Large Language Models [Internet]. Nueva York: arXiv; 2024 [Citado 28/10/24]. Disponible en: http://arxiv.org/abs/2407.12835
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Revista Habanera de Ciencias Médicas

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
All the content of this magazine is in Open Access, distributed according to the terms of the Creative Commons Attribution-Noncommercial 4.0 License that allows non-commercial use, distribution and reproduction without restrictions in any medium, provided that the primary source is duly cited.