학술
기타
Musculoskeletal surgeons use mixed reasoning rather than pure Bayesian strategies in clinical practice
PLOS ONE
조회 0
이 매체는 공공·자유 라이선스로 본문을 직접 표시합니다.
Figures
Abstract
Objectives
To inform efforts to promote regular and normalized Bayesian reasoning, we studied factors associated with the degree to which surgeons use Bayesian reasoning to navigate uncertainty across different clinical scenarios.
Methods
Science of Variation Group members (153; 58% North America, 30% Europe, 69% over 15 years of experience) completed an online survey reading 8 scenarios of test and treatment decisions and chose one of 4 answer options with higher scores indicating more Bayesian reasoning. Internal consistency of the survey was assessed using Cronbach alpha.
Results
The average Bayesian reasoning score across all scenarios was 3.0 (IQR 2.7–3.2) on a 4-point scale, indicating a relative context-dependent variability. Completely non-Bayesian reasoning was selected least often (8.6%, 90 of 1,044) and fully Bayesian reasoning represented 29% (301 of 1,044) of responses. Most surgeons showed mixed patterns (defined as reasoning in which prior probability is acknowledged but underweighted, without explicit probabilistic updating): 85% (121 of 142) used fully Bayesian reasoning at least once (121 of 142) while 42% (60 of 142) used completely non-Bayesian reasoning at least once. The Cronbach alpha was 0.43 suggesting the scenarios measured different aspects of clinical reasoning rather a unified construct.
Citation: Parisien R, Drost A, Razi A, Ramtin S, Ring D, Janssen SJ (2026) Musculoskeletal surgeons use mixed reasoning rather than pure Bayesian strategies in clinical practice. PLoS One 21(6): e0351694. https://doi.org/10.1371/journal.pone.0351694
Editor: Ismail Tawfeek Abdelaziz Badr, Menoufia University, EGYPT
Received: October 28, 2025; Accepted: May 30, 2026; Published: June 12, 2026
Copyright: © 2026 Parisien et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data are within the manuscript and its Supporting information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Background
Most medical decisions are made under conditions of uncertainty. Clinicians assess probabilities based on knowledge and evidence while navigating incomplete information and unresolvable uncertainty to advise people about the potential benefits and harms of visits, tests, and treatments. Bayesian reasoning provides a normative framework for this process, involving an initial probability estimate (the prior), systematic revision based on how strongly new evidence supports a given hypothesis (the likelihood), producing an updated probability (the posterior) [1]. This iterative, probabilistic process mirrors the dynamic nature of clinical decision-making, where surgeons integrate new evidence to revise probabilities based on prior knowledge such as clinical experience, research findings, and patient-specific factors to guide decisions.
Bayesian reasoning is particularly applicable in surgery where decisions often cannot be reversed, and both over- and under-treatment carry notable potential for harm. Despite growing emphasis on Bayesian principles in medicine and research [2,3], this framework remains under-emphasized in medical education and daily practice [4–6]. Research suggests clinicians adjust their diagnostic judgments as new evidence emerges but whether this reflects deliberate Bayesian reasoning or less systematic processes remains unclear [7,8].
Research in medical decision-making, psychology, and cognitive science consistently shows that while people understand the idea of updating beliefs with new evidence, they frequently struggle to do so accurately in practice [9,10]. This discrepancy arises because human cognition relies on mental shortcuts (heuristics) that can lead to systematic errors — such as base rate neglect, uncertainty intolerance, anchoring, and confirmation bias — which interfere with optimal probability updating [11–17]. Neutralizing these cognitive errors requires deliberate cultivation of critical thinking skills, which can be challenging in time-pressured clinical environments [18]. Despite the theoretical appeal of Bayesian reasoning and its alignment with many clinicians’ reasoning styles, the application of Bayesian principles in clinical practice is inconsistent [11,12,19–22].
Rationale
These considerations raise important questions about how often and how effectively surgeons employ Bayesian reasoning in everyday practice [23,24]. The variability in how surgeons integrate evidence likely reflects multiple influences, including differences in clinical training environments, emphasis on critical thinking as opposed to unchecked assimilation of clinician habits and customs (”hidden curriculum”), and underlying cognitive dispositions that shape decision-making styles [25–28]. Some clinicians may have trained in settings that explicitly emphasize probabilistic reasoning and critical appraisal, while others developed their approach through more pattern-recognition models [29]. Individual comfort with probability, tolerance for ambiguity, and reliance on experiential “rules of thumb” can all affect how new information is incorporated [30–34].
Our goal in this study is to explore how musculoskeletal surgeons reason under common scenarios of uncertainty focusing on the spectrum of strategies, ranging from non-Bayesian approaches (threshold-based or heuristic) to fully Bayesian reasoning with explicit probabilistic updating. Rather than making value judgments, we aim to better understand how surgeons think in uncertain situations and to identify patterns that could inform future efforts to support effective clinical decision-making focused on quality and safety of care. Non-Bayesian reasoning patterns — particularly base rate neglect and over-reliance on objective test results — have been associated with tangible clinical harms, including unnecessary surgery, overdiagnosis, and failure to appropriately reassure patients in low-probability scenarios; understanding the prevalence and context-dependence of these patterns is therefore a prerequisite for designing interventions that reduce avoidable harm.
Questions
As a next step towards these goals, we conducted a survey- and scenario-based experiment of musculoskeletal surgeons to evaluate the prevalence and variability of Bayesian reasoning in surgical practice asking: (1) Do musculoskeletal surgeons employ Bayesian reasoning in their clinical decision-making? (2) Is there variation in how surgeons utilize Bayesian reasoning across different clinical scenarios?
Materials and methods
Study design and setting
In a survey- and vignette-based experiment participants viewed eight scenarios a selected from among 4 response options testing Bayesian reasoning. The eight scenarios and options were developed by the lead author and edited and affirmed by the research team. The scenarios and options assessed how orthopaedic surgeons navigate clinical decision-making under common situations of uncertainty (Appendix 1). Specifically, the scenarios were designed to test specific aspects of Bayesian reasoning including probability updating, interpretation of test results, and integration of prior probabilities with new evidence. Each scenario had four possible responses reflecting different levels of Bayesian reasoning. Answers were ranked (1–4) according to degree of Bayesian reasoning and were listed in randomized order (Table 1). Firstly, to mitigate potential confounding, we iteratively pilot tested the scenarios by vetting the scenarios by a diverse group of local surgeons to ensure the clinical content was accessible to general musculoskeletal surgeons, regardless of sub-specialization. Secondly, the design was focused on structure over content by prioritizing the reasoning process rather than factual, clinical knowledge (e.g., updating priors based on new data). Thus, if a surgeon with a lesser knowledge of the specific pathology would still be able to answer appropriately by applying the required logic. Third and lastly, we interpreted the observed variation across scenarios not as an artifact, but as a key finding. The survey was distributed on December 2024 to members of the Science of Variation Group (SOVG). Two weekly reminders were provided and participation closed in January 2025. All participating surgeons treat musculoskeletal pathophysiology including orthopaedic surgeons, European trauma surgeons (who also treat fractures), and plastic hand and wrist surgeons.
Ethical considerations
Scoring validation
For construct validity, a normative Bayesian rubric was developed to score responses. Each response option was ranked (1–4) based on adherence to Bayesian principles: specifically, the correct application of prior probabilities (base rates), the use of likelihood ratios to update beliefs, and the avoidance of logical fallacies such as base-rate neglect or zero-risk bias. The detailed mathematical justification for each scenario’s scoring including priors, likelihoods, and posterior calculations is provided (Appendix 2).
Participants/study subjects
This study utilized a convenience sample of musculoskeletal surgeons recruited through the SOVG. One hundred and fifty-three surgeons participated, with 124 complete responses on all 8 scenarios, and 142 participants who answered at least one scenario (in other words, 18 partial responses). Partial responses are usable in this study design. However, this approach assumes responses are missing at random; if non-completion is related to reasoning difficulty or discomfort with certain scenarios, this could introduce bias, though the direction of such bias is unclear. The respondents were predominantly from North America (58%) and Europe (30%). Most participants (69%) had extensive clinical experience and worked in academic practice (75%). Sixty-nine percent had over 15 years in practice (Table 2). For our purposes, it was sufficient to measure Bayesian reasoning in a relatively engaged and academic cohort, but this subset of surgeons (and arguably no subset) is representative of the average surgeon.
Variables, outcome measures, data sources, and bias
To calculate total scores for individual surgeons, their responses on each 4-point scale were averaged by dividing the total score by the number of questions answered, resulting in a final score on the same 1-to-4 scale with a higher score indicating more Bayesian reasoning. We included the results of 18 participants that answered fewer than 8 scenarios using the average score per scenario completed. To ensure transparency and reproducibility, the de-identified dataset underlying these findings is available as S1 Dataset.
Statistical analysis
The degree to which the 8 survey items addressed similar aspects of reasoning (internal consistency) was assessed using Cronbach alpha. We chose to treat the outcome as continuous (linear) because we are analyzing a composite score (the average of 8 scenarios). We prioritized this approach for interpretability; summary statistics like means and medians are intuitive, whereas the coefficients from complex ordinal models (like cumulative link models) are difficult to interpret practically and robustness; since we are averaging across multiple scenarios, the resulting score distribution approximates a continuous variable, making linear summaries appropriate for describing the variation in reasoning. Overall score distributions were evaluated using the Shapiro-Wilk test for normality. As the data demonstrated a significant deviation from normal distribution (p < 0.01), descriptive statistics are reported as medians with interquartile ranges (IQR). Non-parametric statistics were used for descriptive purposes given the non-normal distribution; linear regression was used separately to assess associations between surgeon characteristics and Bayesian reasoning score. All statistical analyses were performed using Python 3.10 (statsmodels package in Python). A two-tailed p value < 0.05 was considered significant. SOVG studies have varied participation that we cannot anticipate or alter. The statistical power comes from the number of observations, which is always very high, rather than the number of observers. Adequate power is determined by the ability to detect significant associations. We included any scenario that was rated even if the participant did not complete the entire survey, because each scenario and rating are independent from one another. Adequate statistical power considerations are assessed by the ability to measure significant associations.
Results
Do musculoskeletal surgeons employ Bayesian reasoning in their clinical decision-making?
The median Bayesian reasoning score across all scenarios was 3.0 (IQR 2.7–3.2) on a 4-point scale (equivalent to a total score of 24, IQR 22–26). Without an external comparator, we interpret this score to reflect context-dependent variability rather than adherence to a universal Bayesian reasoning framework. While Answer choice 1 (completely non-Bayesian reasoning) was selected least often (approx. 9%), and Answer choice 4 (fully Bayesian reasoning) represented 29% of all responses, most surgeons showed mixed patterns—85% used fully Bayesian reasoning at least once, while 42% used non-Bayesian reasoning at least once.
The distribution of scores revealed a significant deviation from normality (Shapiro-Wilk p < 0.001), with 83% of scores clustering between 2.5 and 3.5 (Fig 1). The distribution showed a negative skew, indicating more extreme low scores (non-Bayesian) than extreme high scores. At the extremes, 6.5% scored in the non-Bayesian or mostly non-Bayesian range (1.8–2.3), while 12% demonstrated strong Bayesian reasoning (3.4–3.7). No participant selected fully Bayesian responses in all 8 scenarios.
At the extremes, 6.5% scored in the non-Bayesian or mostly non-Bayesian range (1.8-2.3) while 12% demonstrated strong Bayesian reasoning (3.4-3.7).
Is there variation in how surgeons utilize Bayesian reasoning across different clinical scenarios?
Internal Consistency analysis revealed a Cronbach alpha of 0.43, suggesting the scenarios measure somewhat different aspects of clinical reasoning rather than a unified construct. This scenario-specific variation was evident in the response patterns, ranging from predominantly mixed reasoning in some scenarios (e.g., Scenario 6: 60% choosing answer choice 3) to more Bayesian responses in others (e.g., Scenario 2: 50% choosing answer choice 4). The low Cronbach alpha indicates that how a surgeon reasons in one scenario does not strongly predict their approach in another, suggesting context-dependent adaptation of reasoning strategies rather than an overall disposition.
Discussion
Background, rationale, and general results
Clinical decision-making in orthopaedic surgery requires constant probability assessment and updating as new evidence emerges. While Bayesian reasoning provides a theoretical framework for this process, little is known about how surgeons address uncertainty in practice. This study examined how orthopaedic surgeons employ probabilistic reasoning in common clinical scenarios and found that surgeons generally use mixed reasoning strategies rather than consistently Bayesian or non-Bayesian approaches, with notable variation (both among individual surgeons and also collectively but without obvious patterns) across contexts. These findings are broadly consistent with prior investigations. Rottman and colleagues demonstrated context-dependent Bayesian updating in residents [7,8], and our results suggest a similar pattern persists among experienced attending surgeons. Our base rate neglect findings parallel those of Manrai and colleagues who found that physicians frequently miscalculate positive predictive value [9], and Teunis and colleagues who found only 11% of orthopaedic surgeons correctly applied base rate reasoning — a figure strikingly similar to our scaphoid scenario result [17]. Notably, our median score of 3 is higher than what purely non-Bayesian frameworks would predict, consistent with Croskerry’s adaptive expertise model suggesting that experienced clinicians develop context-sensitive rather than uniformly deficient reasoning strategies [27].
Limitations
As with any survey-based experiment, a primary limitation is ecological validity. We can assume that surgeon responses in an online survey are not identical to approaches they may use with patients. The scenarios used in the survey necessarily simplified the complexities and dynamic nature of real-world decision-making. Nevertheless, the trends noticed are likely to reflect elements of variation in daily practice. And with additional influence of stress contagion, framing, and anchoring introduced by the referring clinician, the patient, and other aspects of the context, we anticipate greater variation and perhaps less Bayesian processing. Additionally, contemporary medical education emphasizes test-taking skills, which may lead surgeons to recognize and select a “normative” or preferred answer. Furthermore, some of the more Bayesian-oriented responses involved more complex probabilistic reasoning in the answer choice wording, which could further guide participants toward those choices. All experiments are subject to similar Hawthorne effects (the tendency of participants to modify their behavior when they know they are being studied) [35], but such effects do not seem to hinder reproducible and useful findings considering previous experiments conducted among members of the SOVG. A substantial limitation is that the SOVG sample — predominantly academic, highly experienced, and self-selected for engagement with research — is not representative of the average musculoskeletal surgeon. This non-representativeness likely operates in a specific direction: the SOVG cohort probably overestimates Bayesian reasoning relative to broader surgical populations, suggesting the true prevalence of non-Bayesian reasoning in everyday practice may be higher than our findings indicate. The finding of mixed reasoning in this relatively engaged and analytically inclined group may therefore represent an optimistic upper bound rather than a typical portrait of surgical cognition. Additionally, the skewed experience distribution — with only 3.5% of participants having fewer than 5 years in practice — may bias results toward reasoning patterns of senior surgeons and may not reflect the full spectrum of clinical reasoning across career stages.
Do musculoskeletal surgeons employ Bayesian reasoning in their clinical decision-making?
The finding that surgeons of the SOVG employ context-dependent mixed reasoning strategies rather than consistently non-Bayesian or fully Bayesian approaches suggests that clinical reasoning in musculoskeletal surgery combines probabilistic thinking with other decision-making approaches based on clinical context. This integration of Bayesian and non-Bayesian approaches appears to reflect the practical realities of clinical decision-making, where pure probabilistic or pure non-probabilistic reasoning must be balanced against established protocols, clinical experience, and practical constraints. Our initial thought that surgeons might show a preference for non-Bayesian reasoning was only partially supported; while surgeons did not demonstrate strong Bayesian reasoning (which would require scores >3.5), they showed more sophisticated probabilistic thinking than expected. The distribution of responses, with most surgeons (85%) employing fully Bayesian reasoning in at least one scenario but few doing so consistently, suggests that surgeons possess the capability for Bayesian reasoning but apply it selectively rather than as a universal approach. Although linear regression identified slightly lower Bayesian reasoning scores among European participants (β = −0.23, 95% CI −0.41 to −0.055, p = 0.011), we interpret this finding cautiously given the small regional subgroup sizes, the non-representative nature of the SOVG sample, and the absence of data on specific educational or training system differences that might explain this variation.
It is important to distinguish between two possible explanations for mixed reasoning patterns. In some scenarios — most clearly the scaphoid scenario, where only 11% of surgeons correctly weighted a low prior probability against a positive CT — the non-Bayesian responses likely reflect genuine base rate neglect, a well-documented cognitive bias with direct clinical consequences. In other contexts, however, mixed or non-Bayesian responses may reflect adaptive expertise rather than cognitive failure. Croskerry (2018) argues that expert clinical reasoning is characterized not by uniform application of a single strategy, but by the ability to modulate reasoning approaches according to situational demands [27]. On this view, context-dependent reasoning is not synonymous with deficient reasoning. Our data do not allow us to fully distinguish between these two phenomena, and we acknowledge that both are likely present across the scenarios and surgeons studied.
Is there variation in how surgeons utilize Bayesian reasoning across different clinical scenarios?
The finding of variation in how surgeons apply Bayesian reasoning across different clinical scenarios (low Cronbach alpha [0.43] and the distinct response patterns by scenario), suggests that heuristics, habits, norms, and emotions often override probabilistic reasoning in certain clinical contexts. Two interpretations of this low alpha are plausible and not mutually exclusive. First, the scenarios may not uniformly operationalize a single, unified Bayesian reasoning construct — they vary in structure, clinical domain, and the specific cognitive demands they place on the respondent, which could introduce measurement heterogeneity independent of true reasoning variability. Second, and consistent with our primary interpretation, Bayesian reasoning may not function as a stable individual trait but rather as a context-sensitive strategy that surgeons deploy selectively. The low alpha is compatible with both explanations, and we acknowledge that the instrument design does not allow us to fully distinguish between them. This ambiguity itself carries a meaningful implication: Bayesian reasoning may resist simple psychometric capture precisely because it is situationally expressed rather than dispositionally fixed [36].
For instance, in the twin MRI scenario (Scenario 2; in Table 3), 50% of surgeons demonstrated full Bayesian reasoning (answer choice 4), with an average score of 3.0. In contrast, the scaphoid scenario (Scenario 1; Table 4) had the lowest average score of 2.6, with only 11% of surgeons selecting the most Bayesian response. The scaphoid scenario may be more influenced by specific habits, norms, and emotions. Similarly, in the bone necrosis scenario (Scenario 4; Table 5), only about a third of participating surgeons maintained appropriate confidence in a high prior probability despite contradictory test results. In the delayed union scenario (Scenario 5; Table 6) also around a third of surgeons recognized that two experienced surgeons could legitimately arrive at different interpretations based on their prior experience with similar cases. These variations support the conclusion that surgeons may adapt their reasoning approach based on the specific clinical context rather than employing a consistent Bayesian or non-Bayesian strategy. This context-dependent reasoning is further supported by response patterns across scenarios. While most scenarios (particularly 1, 3, 4, 5, and 7) showed a predominance of mixed reasoning (answer choice 3), certain contexts appeared to trigger more Bayesian thinking. For instance, the twin MRI scenario’s clear contrast between prior probabilities and identical imaging findings made Bayesian reasoning more intuitive, prompting 50% of respondents to select the fully Bayesian option. Conversely, scenarios involving common clinical protocols or established norms, such as the scaphoid fracture scenario, tended to elicit more habitual or less probabilistic reasoning. These variations in reasoning highlight surgeons’ flexibility in adapting their approaches, which may explain the relatively low Cronbach alpha. It may underscore the adaptability of reasoning strategies to align with the unique demands of different scenarios. This adaptive variation is a hallmark of expert clinical decision-making, where surgeons must integrate probabilistic reasoning with heuristic and contextual approaches based on the specific challenges of each case.
Other relevant findings
Illustrative scenarios.
Key scenarios demonstrate distinct patterns in Bayesian reasoning application. In the scaphoid scenario (Scenario 1 as shown in Table 4), surgeons were given a positive CT scan in a low-probability situation (5% prior). Only 16/142 (11%) chose the most Bayesian response recognizing that even with positive findings, the posterior probability remains low (15%) due to the low prior probability. Instead, 35% of surgeons chose to trust the positive imaging over clinical assessment, demonstrating base rate neglect.
The twin MRI scenario (Scenario 2 as shown in Table 3) revealed surprisingly sophisticated Bayesian reasoning, with 50% of respondents demonstrating full Bayesian reasoning (answer 4) and only 1.5% selecting non-Bayesian options. However, this high rate of Bayesian responses may reflect scenario design limitations rather than typical reasoning patterns, as the identical MRI findings with contrasting clinical histories made the role of context unusually explicit.
In the bone necrosis scenario (Scenario 4 as shown in Table 5), only 28% of surgeons maintained appropriate confidence in a high prior probability (99%) despite contradictory test results. Most surgeons (72%) allowed a single negative test to override strong clinical suspicion, again demonstrating difficulty maintaining appropriate weighting of prior probabilities.
The delayed union scenario (Scenario 5 as shown in Table 6) revealed how clinicians sometimes struggle with probabilistic reasoning in practice. Only 35% of respondents accepted that different clinicians could legitimately arrive at different interpretations based on prior experience. This finding illuminates a broader challenge: the tension between probabilistic reasoning and the desire for definitive answers. Most clinicians favored seeking additional testing over acknowledging the validity of experience-based probability differences, suggesting discomfort with the inherent uncertainty in probabilistic reasoning.
Base rates and prior probabilities in clinical reasoning.
These results show evidence for base rate neglect in varying degrees in many surgeons’ responses to the scenarios. Base rates and prior probabilities form the foundation of Bayesian reasoning. The primary Bayesian insight is that test results gain meaning only when contextualized by prior probability. Base rate neglect is especially problematic because it directly undermines accurate probabilistic assessment. In clinical practice, the base rate might represent disease prevalence, complication likelihood, or the probability of specific injury patterns in given populations. Neglecting these base rates leads to flawed probability estimates, particularly when interpreting test results. For example, as demonstrated in Scenarios 1 and 4, even highly accurate tests can yield misleading interpretations if the base rate is very low or very high. Without considering base rates, clinicians might overestimate the significance of positive test results, leading to unnecessary interventions or missed alternative diagnoses [37].
The clinical consequences of base rate neglect can be quantified using the scaphoid scenario as a worked example. With a prior probability of 5% and a CT scan that is 85% sensitive and 75% specific, a positive result yields a posterior probability of approximately 15% — meaning the fracture remains unlikely despite the positive test. In a hypothetical cohort of 100 patients with this presentation, approximately 28 would receive a positive CT result, of whom roughly 24 would represent false positives. The 35% of surgeons in our study who chose to defer to the CT over clinical assessment would, in practice, expose the large majority of these patients to unnecessary immobilization, additional imaging, and the attendant costs and anxiety — all of which are avoidable through appropriate application of prior probability. This example illustrates that the gap between abstract reasoning scores and clinical outcomes is not merely theoretical: context-dependent base rate neglect has a direct and calculable impact on overtriage and overtreatment.
While priors can incorporate subjective clinical experience, the base rate provides an objective foundation for initial probability estimates. Previous research suggests surgeons struggle with this concept; Teunis et al. (2016) found only 11% of orthopaedic surgeons correctly answered questions requiring base rate consideration [17]. Our survey results demonstrated that fully considering base rates remains challenging, particularly when test results appear to contradict these fundamental probabilities.
Variation of reasoning in the era of AI.
This study should be understood as a variation‑of‑reasoning paper rather than a traditional variation‑of‑care analysis. Instead of examining differences in treatment patterns, we focus on the underlying cognitive processes that shape how surgeons interpret evidence and update diagnostic beliefs. The finding that surgeons employ context‑dependent, mixed reasoning strategies highlights meaningful heterogeneity in how clinicians navigate uncertainty, even when presented with identical information. This distinction is increasingly important in a moment when AI‑assisted clinical decision tools are poised to rely heavily on Bayesian or Bayesian‑like probabilistic updating, offering consistent, mathematically coherent reasoning. Our results suggest that human cognition does not always align with these normative probabilistic frameworks, raising timely questions about how surgeons will interact with AI systems, how discrepancies between human and machine reasoning may influence decisions, and how training might better prepare clinicians for a future in which Bayesian reasoning is embedded in the tools that support patient care [38].
Conclusions
This study demonstrates that orthopaedic surgeons employ more sophisticated probabilistic reasoning than expected, though their approach is notably context-dependent rather than consistently Bayesian. These findings should be interpreted with the caveat that the SOVG sample represents a ceiling on external validity; results are likely not generalizable to the average practicing surgeon, and the true extent of non-Bayesian reasoning in broader surgical populations may be greater than observed here. While surgeons regularly incorporate elements of Bayesian thinking, they do so selectively based on clinical context, as evidenced by the median score of 3.0 and the low internal consistency across scenarios. The findings reveal specific challenges in clinical reasoning, particularly regarding base rate consideration and the acceptance of differing probability estimates based on prior experience. These insights suggest that future educational initiatives should focus not on rote adoption of Bayesian methods, but rather on helping surgeons recognize specific clinical contexts where probabilistic reasoning is most crucial for patient care. Case-based learning and scenario-specific training — rather than instruction in Bayesian formulas in the abstract — are likely to be most effective, as they mirror the contextual nature of the reasoning variability observed here and allow surgeons to practice probability updating within the kinds of clinical situations where base rate neglect is most consequential. Notably, years in practice was not associated with Bayesian reasoning score in this study (β = −0.017, p = 0.64), suggesting that clinical experience alone does not confer probabilistic reasoning proficiency — a finding with direct relevance for continuing medical education programs, which should not assume that senior clinicians are exempt from reasoning biases that formal training could address. Understanding how and when surgeons employ different reasoning strategies may ultimately lead to more nuanced approaches to clinical decision-making and more effective teaching of clinical reasoning in musculoskeletal surgery. Future research should explore how structured decision-making frameworks, and targeted educational interventions can optimize the application of Bayesian reasoning in musculoskeletal surgery. The context-dependence of reasoning observed here also has implications for clinical guideline implementation: guidelines that assume normative probabilistic reasoning as a uniform substrate may be applied inconsistently across surgeons and clinical contexts, suggesting that guideline design itself may need to account for the variability in how evidence is interpreted and weighted.
Supporting information
S1 Dataset. Deidentified dataset.
This was the dataset used for the analysis of this study.
https://doi.org/10.1371/journal.pone.0351694.s001
(CSV)
References
- 1. Bours MJ. Bayes’ rule in diagnosis. J Clin Epidemiol. 2021;131:158–60. pmid:33741123
- 2. Weatherall M. Information provided by diagnostic and screening tests: improving probabilities. Postgrad Med J. 2018;94(1110):230–5. pmid:29133377
- 3.
Broemeling DL. Bayesian Methods in Epidemiology. 2014.
- 4. Norman G, Pelaccia T, Wyer P, Sherbino J. Dual process models of clinical reasoning: The central role of knowledge in diagnostic expertise. J Eval Clin Pract. 2024;30(5):788–96. pmid:38825755
- 5. Mutlak Z, Saqer N, Chan SCC. The misdiagnosis tracker: enhancing diagnostic reasoning through cognitive bias awareness and error analysis. J Clin Med. 2025;14.
- 6. Pauker SG, Kassirer JP. The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109–17. pmid:7366635
- 7. Rottman BM, Prochaska MT, Deaño RC. Bayesian reasoning in residents’ preliminary diagnoses. Cogn Res Princ Implic. 2016;1(1):5. pmid:28180156
- 8. Rottman BM. Physician Bayesian updating from personal beliefs about the base rate and likelihood ratio. Mem Cognit. 2017;45(2):270–80. pmid:27752962
- 9. Manrai AK, Bhatia G, Strymish J, Kohane IS, Jain SH. Medicine’s uncomfortable relationship with math: calculating positive predictive value. JAMA Intern Med. 2014;174(6):991–3. pmid:24756486
- 10. Eddy DM. Probabilistic reasoning in clinical medicine: Problems and opportunities. Judgment under Uncertainty. Cambridge University Press; 1982. pp. 249–67.
- 11. Bruckmaier G, Krauss S, Binder K, et al. Tversky and Kahneman’s cognitive illusions: who can solve them, and why? Front Psychol 2021;12:584689.
- 12. Patel S. Reason knows nothing: how biases infect medicine. J R Soc Med. 2018;111(6):214–5. pmid:29672205
- 13. Bar-Hillel M. The base-rate fallacy in probability judgments. Acta Psychologica. 1980;44(3):211–33.
- 14.
Dan B, Bodoh-Creed A, Rabin M. Base-Rate Neglect: Foundations and Implications. 2019. [cited 2026 Jan 21]. Available from: https://rabin.scholars.harvard.edu/sites/g/files/omnuum7721/files/rabin/files/baserateneglect-2019-07.pdf
- 15. Janssen SJ, Teunis T, Ring D. Cognitive biases in orthopaedic surgery. J Am Acad Orthop Surg. 2021;29:624–33.
- 16. Kahneman D, Tversky A. On the psychology of prediction. Psychol Rev. 1973;80:237–51.
- 17. Teunis T, Janssen S, Guitton TG, et al. Do orthopaedic surgeons acknowledge uncertainty? Clin Orthop Relat Res. 2016;474:1360–9.
- 18. Croskerry P. A universal model of diagnostic reasoning. Acad Med. 2009;84(8):1022–8. pmid:19638766
- 19. Ataç Ö, Küçükali H, Farımaz AZT, Palteki AS, Çavdar S, Aslan MN, et al. Family physicians overestimate diagnosis probabilities regardless of the test results. Front Med (Lausanne). 2024;10:1123689. pmid:38259829
- 20. Hall S, Phang SH, Schaefer JP, Ghali W, Wright B, McLaughlin K. Estimation of post-test probabilities by residents: Bayesian reasoning versus heuristics? Adv Health Sci Educ Theory Pract. 2014;19(3):393–402. pmid:24449125
- 21. McDowell M, Jacobs P. Meta-analysis of the effect of natural frequencies on Bayesian reasoning. Psychol Bull. 2017;143(12):1273–312. pmid:29048176
- 22. Casscells W, Schoenberger A, Graboys TB. Interpretation by physicians of clinical laboratory results. N Engl J Med. 1978;299(18):999–1001. pmid:692627
- 23. Winkler RL. Why Bayesian analysis hasn’t caught on in healthcare decision making. Int J Technol Assess Health Care. 2001;17(1):56–66. pmid:11329845
- 24. Gigerenzer G, Hoffrage U. How to improve Bayesian reasoning without instruction: frequency formats. Psychol Rev. 1995;102:684–704.
- 25. Aronowitz PB, Williams DM, Henderson MC, Winston LG. Mind the base rate: an exercise in clinical reasoning. J Gen Intern Med. 2019;34(9):1941–5. pmid:31270792
- 26. Norman GR, Eva KW. Diagnostic error and clinical reasoning. Med Educ. 2010;44(1):94–100. pmid:20078760
- 27. Croskerry P. Adaptive expertise in medical decision making. Med Teach. 2018;40(8):803–8. pmid:30033794
- 28. Brush JE Jr, Lee M, Sherbino J, Taylor-Fishwick JC, Norman G. Effect of teaching Bayesian methods using learning by concept vs learning by example on medical students’ ability to estimate probability of a diagnosis: a randomized clinical trial. JAMA Netw Open. 2019;2(12):e1918023. pmid:31860107
- 29. Bijak J, Bryant J. Bayesian demography 250 years after Bayes. Popul Stud (NY). 2016;70:1–19.
- 30. Simpkin AL, Schwartzstein RM. Tolerating uncertainty - the next medical revolution? N Engl J Med. 2016;375:1713–5.
- 31. Strout TD, Hillen M, Gutheil C, Anderson E, Hutchinson R, Ward H, et al. Tolerance of uncertainty: a systematic review of health and healthcare-related outcomes. Patient Educ Couns. 2018;101(9):1518–37. pmid:29655876
- 32. Lighthall GK, Vazquez-Guillamet C. Understanding decision making in critical care. Clin Med Res. 2015;13(3–4):156–68. pmid:26387708
- 33. Korenstein D, Scherer LD, Foy A, Pineles L, Lydecker AD, Owczarzak J, et al. Clinician attitudes and beliefs associated with more aggressive diagnostic testing. Am J Med. 2022;135(7):e182–93. pmid:35307357
- 34. Baghdadi JD, Korenstein D, Pineles L, Scherer LD, Lydecker AD, Magder L, et al. Exploration of primary care clinician attitudes and cognitive characteristics associated with prescribing antibiotics for asymptomatic bacteriuria. JAMA Netw Open. 2022;5(5):e2214268. pmid:35622364
- 35. McCambridge J, Witton J, Elbourne DR. Systematic review of the Hawthorne effect: new concepts are needed to study research participation effects. J Clin Epidemiol. 2014;67(3):267–77. pmid:24275499
- 36. Ng IKS, Goh WGW, Lim TK. Beyond thinking fast and slow: a Bayesian intuitionist model of clinical reasoning in real-world practice. Diagnosis (Berl). 2024;12(2):182–8. pmid:39648275
- 37. Gonzaga YBDM, Bacchi AD, De Souza VBP. When math legitimizes knowledge: a step by step approach to Bayes’ rule in diagnostic reasoning. Evidence. 2024;6:e5903.
- 38. Greengrass CJ. Transforming clinical reasoning-the role of AI in supporting human cognitive limitations. Front Digit Health. 2026;7:1715440. pmid:41561162
이 뉴스, 독자들은 어떻게 느꼈나요?
첫 반응을 남겨보세요로그인하면 감정 반응에 참여할 수 있어요.
관련 뉴스
관련 뉴스 제보는 로그인 후 가능합니다.
'research' 카테고리 뉴스
Correction: A new criterion for defining tunnel portal failure using the strength reduction method
PLOS ONE
Drug-induced gastric motility disorders: A disproportionality analysis from the FAERS and CVARD databases
PLOS ONE
Long-term trends in height, weight and body mass index of children and adolescents in Macao Special Administrative Region (China), 2005–2020
PLOS ONE
PLOS의 다른 기사
KYNU in macrophages contributes to the unique immune feature of LUAD via integrating single-cell and bulk RNA sequencing data: an exploratory analysis
PLOS ONE
Perspectives of community-dwelling older adults with chronic diseases on Baduanjin practice: A qualitative study
PLOS ONE
Identification of candidate sex hormone-associated genes and immune infiltration characteristics in osteoarthritis based on bioinformatics analysis and machine learning
PLOS ONE