Artificial intelligence in foot and ankle pathology: Can large language models replace us?

Florencio Pablo Segura; Facundo Manuel Segura; Julieta Porta; Natalia Heredia; Ignacio Masquijo; Federico Anain; Leandro Casola; Agustina Trevisson; Virginia Cafruni; Maria Paz Lucero Zudaire; Ignacio Toledo; Florencio Vicente Segura

doi:10.30795/jfootankle.2024.v18.1757

Authors

Florencio Pablo Segura Segura, Centro Privado de Ortopedia y Traumatología, Córdoba, Argentina https://orcid.org/0000-0002-2376-4834
Facundo Manuel Segura Segura, Centro Privado de Ortopedia y Traumatología, Córdoba, Argentina https://orcid.org/0009-0000-7101-9145
Julieta Porta Sanatorio Allende, Córdoba, Argentina https://orcid.org/0000-0001-9662-0367
Natalia Heredia Instituto Modelo de Cardiología, Córdoba, Argentina https://orcid.org/0009-0002-7215-2137
Ignacio Masquijo Instituto Modelo de Cardiología, Córdoba, Argentina https://orcid.org/0000-0002-6284-6410
Federico Anain Unidad de Pierna y Pie, Ciudad Autónoma de Buenos Aires, Argentina
Leandro Casola Sanatorio Dupuytren, Ciudad Autónoma de Buenos Aires, Argentina https://orcid.org/0000-0003-1187-0864
Agustina Trevisson Sanatorio Dupuytren, Ciudad Autónoma de Buenos Aires, Argentina https://orcid.org/0009-0006-5634-0823
Virginia Cafruni Hospital Italiano de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina https://orcid.org/0000-0002-8115-6300
Maria Paz Lucero Zudaire Instituto Modelo de Cardiología. (359 Sagrada Familia Av, X5000, Cordoba, Cordoba. Argentina) https://orcid.org/0009-0009-8632-480X
Ignacio Toledo Sanatorio Allende, Córdoba, Argentina https://orcid.org/0000-0003-4033-8818
Florencio Vicente Segura Segura, Centro Privado de Ortopedia y Traumatología, Córdoba, Argentina https://orcid.org/0009-0004-0424-8334

DOI:

https://doi.org/10.30795/jfootankle.2024.v18.1757

Keywords:

Diagnosis, Treatment Outcome, Observer variation

Abstract

Objective: Determine if large language models (LLMs) provide better or similar information compared to an expert trained in foot and ankle pathology in various aspects of daily practice (definition and treatment of pathology, general questions). Methods: Three experts and two artificial intelligent (AI) models, ChatGPT (GPT-4) and Google Bard, answered 15 specialty-related questions, divided equally among definitions, treatments, and general queries. After coding, responses were redistributed and evaluated by five additional experts, assessing aspects like clarity, factual accuracy, and patient usefulness. The Likert scale was used to score each question, enabling experts to gauge their agreement with the provided information. Results: Using the Likert scale, each question could score between 5 and 25 points, totaling 375 or 75 points for evaluations. Expert 2 led with 69.86%, followed by Expert 1 at 68.53%, ChatGPT at 64.80%, Expert 3 at 58.40%, and Google Bard at 54.93%. Comparing experts, significant differences emerged, especially with Google Bard. The rankings varied in specific sections like definitions and treatments, highlighting GPT-4’s variability across sections. The results emphasize the differences in performance among experts and AI models. Conclusion: Our findings indicate that GPT-4 often performed comparably to or even better than experts, particularly in definition and general question sections. However, both LLMs lagged notably in the treatment section. These results underscore the potential of LLMs as valuable tools in orthopedics but highlight their limitations, emphasizing the irreplaceable role of expert expertise in intricate medical contexts. Evidence Level: III, observational, analytics.

Artificial intelligence in foot and ankle pathology: Can large language models replace us?

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Indexed Journal

Follow us

Associated Journal