Biomedical Natural Language Inference (BioNLI) is a core task in biomedical NLP that looks to identify whether the biomedical premise entails the hypothesis from the premise. Prompt-based methods are gaining traction as one of the simplest and fastest approaches for effectively using large language models (LLMs) for these sorts of inference tasks without the need for complex, time-consuming fine-tuning of the model. However, the inherent difficulty of biomedical NLI problems represents a significant challenge for prompt engineering given the heavy use of terminology specific to the biomedical domain. During in-context prompting or using predefined examples for zero-shot or few-shot prompting, there is often a lack of contextual information or generalizability to the heterogeneity found in biomedical texts for determining entailment decisions. In this work, we present a comprehensive evaluation of a variety of prompting methods (zero-shot, few-shot static, few-shot dynamic, Chain-of-Thought, self-consistency, and Tree-of-Thought) with two LLMs, DeepSeek-R1-Distill-Qwen-14B and LLaMA-3.1-8B-Instruct, from the prompt-engineering perspective. We applied these methods to the BioNLI dataset and reported on key evaluation metrics across all methods. Our results show that dynamic contextual in-context prompting, together with structured reasoning, produces high-quality inference in our context. Between all of the models and configurations, few-shot ToT prompting using the DeepSeek model produced the best results, scoring a macro-F1 score of 71.05, even outperforming retrieval-augmented models reported on in prior studies. These findings show that prompt engineering alone can handle complex biomedical reasoning effectively, without needing retrieval or full fine-tuning.
Alkhawaf, H. Fadhil Qasim and Faili, H. (2025). Prompt Engineering for Biomedical NLI: An Exploratory Study.. AUT Journal of Modeling and Simulation, (), -. doi: 10.22060/miscj.2025.24382.5421
MLA
Alkhawaf, H. Fadhil Qasim, and Faili, H. . "Prompt Engineering for Biomedical NLI: An Exploratory Study.", AUT Journal of Modeling and Simulation, , , 2025, -. doi: 10.22060/miscj.2025.24382.5421
HARVARD
Alkhawaf, H. Fadhil Qasim, Faili, H. (2025). 'Prompt Engineering for Biomedical NLI: An Exploratory Study.', AUT Journal of Modeling and Simulation, (), pp. -. doi: 10.22060/miscj.2025.24382.5421
CHICAGO
H. Fadhil Qasim Alkhawaf and H. Faili, "Prompt Engineering for Biomedical NLI: An Exploratory Study.," AUT Journal of Modeling and Simulation, (2025): -, doi: 10.22060/miscj.2025.24382.5421
VANCOUVER
Alkhawaf, H. Fadhil Qasim, Faili, H. Prompt Engineering for Biomedical NLI: An Exploratory Study.. AUT Journal of Modeling and Simulation, 2025; (): -. doi: 10.22060/miscj.2025.24382.5421