Randomization protocols in clinical studies of common orthopedic problems and traumatic injuries are challenging to orchestrate. The lack of high-grade clinical evidence from prospective randomized, double-blinded study design is often cited as a primary reason for rejecting proposed therapy advances in orthopedic surgery.
Materials and MethodsThis position paper summarizes the clinical trial limitations in surgical subspecialties. We present a consensus statement on how practicing orthopedic surgeon can produce high-quality clinical evidence to affect changes in practice protocols.
ResultsOur literature review revealed that classifications of level of evidence reporting varies between surgical subspecialties. Research in orthopedic- and musculoskeletal trauma care is primarily directed at the diagnosis, preferred treatment, and economic decision analysis, whereas other prognosis-based classifications are preferred in other areas such as plastic surgery. In orthopedics, controlled double-blinded randomization is rare and often unpractical or unethical for a placebo control purpose where patients may be harmed. Crossing over between study groups randomized surgical trials is common. Other obstacles in surgical trials range from lack of organizational and financial support, institutional review or ethics board approval, and registration requirements for clinical trials to insufficient time left outside an already busy clinical schedule to dedicate to this laborious uncompensated task.
ConclusionOrthopedic surgery is as an experience- and skill-based subspecialty. Many innovations start with entrepreneur surgeons as reporting of opinions or retrospective cohort studies many of which suffer from bias. Prospective observational cohort studies with consistent results may offer higher-grade clinical evidence than poorly executed randomized trials.
En los estudios clínicos relacionados con problemas ortopédicos comunes y lesiones traumáticas, los métodos de aleatorización son difíciles de orquestar. La falta de evidencia clínica de alto nivel que tomen como fuente estudios prospectivos, aleatorizados y doble ciego, se cita a menudo como una razón principal para rechazar los avances terapéuticos propuestos en cirugía ortopédica.
Materiales y métodosEste documento de opinión resume las limitaciones de los ensayos clínicos en las subespecialidades quirúrgicas. Se presente un consenso acerca de cómo el cirujano ortopédico en ejercicio puede producir evidencia clínica de alta calidad y de esta forma realizar cambios en sus protocolos de práctica clínica.
ResultadosEsta revisión de la literatura reveló que las clasificaciones del nivel de evidencia varían entre las subespecialidades quirúrgicas. La investigación en la ortopedia y traumatología se dirige principalmente al diagnóstico, el tratamiento preferido y el análisis de decisiones económicas, mientras que otras clasificaciones de pronóstico son preferidas en otras áreas, como cirugía plástica. En ortopedia, los estudios doble ciego controlados son raros y, a menudo, pocos prácticos o, incluso, pocos éticos. El cruzamiento entre ensayos quirúrgicos aleatorios de grupos de estudio es más común. Otras dificultades en los ensayos quirúrgicos van desde: la falta de apoyo organizativo y financiero, aprobación institucional o comité de ética y requisitos de registro para los ensayos clínicos, y hasta tiempo insuficiente por fuera de un programa clínico, ya ocupado para dedicarle a esta laboriosa tarea no compensada.
ConclusiónLa cirugía ortopédica es una subespecialidad basada en la experiencia y la habilidad. Muchas innovaciones comienzan con cirujanos emprendedores reportando informes de opiniones o estudios de cohortes retrospectivos, muchos de los cuales tienen sesgo. Los estudios de cohortes observacionales prospectivos con resultados consistentes pueden ofrecer evidencia clínica de mayor grado que los ensayos aleatorios mal ejecutados.
Orthopedic disorders and musculoskeletal trauma have a significant economic impact globally.1,2 These conditions’ acute and chronic nature are a major contributor to health care expenditure in Latin America and other industrialized countries. Comprehensive care programs with effective collaborations between developed nations in the Americas are rare. The lack of resources to adequately manage trauma and diseases affecting the musculoskeletal system in developing countries and countries with emerging economies adds to the orthopedic disease burden. When measuring the years lived with disability (YLDs) in Latin America and the Caribbean using the 2019 data of the Institute for Health Metrics Evaluation, a significant musculoskeletal disease burden becomes apparent with low back pain (8.19%; 6.3% - 10.19%) being the leading cause of disability (Figure 1). It is followed by osteoarthritis (4.12%; 2.41% - 7.65%), neck pain (2.12%; 1.36% - 3.22%), falls (2.59%;2.28 - 2.97), road injury (1.51%; 1.37% - 1.67), exposure to mechanical forces (0.88%; 0.69% - 1.14%), interpersonal violence (0.49%; 0.42% - 0.56%), unintentional injuries (0.37%; 0.3% - 0.47%, and a group of other musculoskeletal diseases not included in the above (7.57%; 5.54%-10.08%; Source: Institute for Health Metrics Evaluation. Used with permission. All rights reserved). Breaking down the same YDL index by Latin American countries shows a higher orthopedic disease burden in the same categories between the ages of 50-69 years (Figure 2) for both men (YLDs: 25.49%) and women (YLDs: 29.71%) than above 70 years of age (18.01% for males and 22.95; Figure 3). In the age group from 50 to 69, Colombia has the fourth highest disease burden measured in YDLs for rheumatoid and osteoarthritis, trauma, and falls after Argentina, Chile, and Uruguay (Figure 2). In the elderly (70+years of age), the orthopedic disease burden is lower, and Colombia is preceded by Costa Rica, Mexico, Cuba, Brazil, Chile, Argentina, and Uruguay (Figure 3). Taking into account the population growth and a higher percentage of elderly in Latin America with improved overall health due to a shift from communicable or infectious to manageable chronic diseases, such as ischemic heart disease and stroke, which were the top-ranked causes of YLDs in 2019, it is apparent that higher patient numbers should be expected in orthopedic clinics across Latin America many of which are already stretched to the brink of maximum capacity, particularly in the public health care sector. Considering that modern treatment of many orthopedic conditions and injuries is resource intense, a disproportionally greater health care expenditure and research and development investments are needed than in most other areas of public health to identify new, more effective therapies and surgeries. Orthopedic surgeons are at the heart of this debate on allocating resources most efficiently to achieve maximum reduction of musculoskeletal disease burden. The anticipation of these changes in demographics and an expanded emphasis on non-communicable diseases and injuries1 require orthopedic surgeons to develop high-grade clinical evidence to avoid rationing of services pushed by decision-makers trying to mitigate cost explosion. This worst-case scenario would only make high-end modern orthopedic care affordable for the well-to-do, thereby increasing social disparities, which could become the target of counterproductive politicization strategies in the current environment of shifting political agendas in Latin America and the world over.
Illustrative tree map of causes and disease burden expressed in years lived with disability (YLDs) in Latin America and the Caribbean in 2019. YLDs for Low back pain was 8.19% (6.3% - 10.19%), other musculoskeletal diseases 7.57% (5.54%-10.08%), osteoarthritis 4.12% (2.41% - 7.65%) neck pain 2.12% (1.36% - 3.22%), falls 2.59% (2.28 - 2.97), road injury 1.51% (1.37% - 1.67), exposure to mechanical forces 0.88% (0.69% - 1.14%), interpersonal violence 0.49% (0.42% - 0.56%), and other unintentional injuries 0.37% (0.3% - 0.47%): Source: Institute for Health Metrics Evaluation. Used with permission. All rights reserved.
Pyramid plot of causes and disease burden expressed in years lived with disability (YLDs) by country in Latin America and the Caribbean in 2019. In the age group between 50 and 69 years, the combined YLDs for Low back pain, other musculoskeletal diseases, osteoarthritis, neck pain, falls, road injury, exposure to mechanical forces, interpersonal violence, and other unintentional injuries was 25.49% for males and 29.71%: Source: Institute for Health Metrics Evaluation. Used with permission. All rights reserved.
Pyramid plot of causes and disease burden expressed in years lived with disability (YLDs) by country in Latin America and the Caribbean in 2019. In the age group over 70+years, the combined YLDs for Low back pain, other musculoskeletal diseases, osteoarthritis, neck pain, falls, road injury, exposure to mechanical forces, interpersonal violence, and other unintentional injuries was 18.01% for males and 22.95%: Source: Institute for Health Metrics Evaluation. Used with permission. All rights reserved.
The ‘good clinical practice’ (GCP) standards for clinical trials were first formulated by the International Conference on Harmonization (ICH) in 1990 to protect human subjects participating in clinical trials.3 The GCP guideline on human clinical trials is based on the ethical principles published in the Declaration of Helsinki in 1964 4–7 and its latest 2000 revision. 8 As an internationally accepted standard, GCP also promotes good science by formalizing all trial activities ranging from communication with the ethics or institutional review board committee, clinical study protocol, investigational product, randomization including blinding and unblinding procedures, informed consent, documentation, reporting, and trial staff responsibilities related to safely executing the trial protocol all while complying with the local law.
The level of evidenceVarious classifications for the level of clinical evidence have been adopted in numerous societies and journals. It was described initially by the Canadian Task Force on the Periodic Health Examination9 in 1979 and expanded by Sackett in 1989 in an article on antithrombotic agents.10 Since diverse specialties are interested in different clinical context, different types and level of clinical evidence is needed. 11 Research in orthopedic- and musculoskeletal trauma care may be directed at the diagnosis, preferred treatment, prognosis, and economic decision analysis.11 For example, the grading system of the American Society of Plastic Surgeons (ASPS) is based on prognosis (Table 1).12–14 A treatment-based level of evidence grading system has been proposed by the Centre for Evidence-Based Medicine (CEBM; Table 2).15 It is apparent that quality of data assessment when assigning a level grading of the clinical evidence depends on the question asked – in this comparison, treatment versus prognosis. 11 Investigative studies on the natural history of an orthopedic condition cannot be analyzed with a randomized controlled trial (RCT) since no treatments are involved that can be compared. Hence, the highest grade of clinical evidence can only be derived from observational cohort studies or a systematic literature review of multiple cohort studies. 11 Many RCTs never play out as designed and generate lower quality data.16 They are plagued by cross-over problems which hampered many of the anticipated comparisons in one of the largest orthopedic RCTs - the Spine Outcome research trial (SPORT). 17,18 The CEBM recommendations, therefore, assign the same level of evidence grading to poorly designed RCTs as cohort studies. 11
Level of clinical evidence for prognostic studies (Adapted from Burns et al. and the American Society of Plastic Surgeons publications) 11–14
Level | Type of evidence |
---|---|
I | High quality prospective cohort study with adequate power or systematic review of these studies |
II | Lesser quality prospective cohort, retrospective cohort study, untreated controls from an RCT, or systematic review of these studies |
III | Case-control study or systematic review of these studies |
IV | Case series |
V | Expert opinion; case report or clinical example; or evidence based on physiology, bench research or “first principles” |
Level of clinical evidence for therapeutic studies (Adapted from Burns et al. and the Center for Evidence Based Medicine (CEBM) 11,15
Level | Type of evidence |
---|---|
1A | Systematic review (with homogeneity) of RCTs |
1B | Individual RCT (with narrow confidence intervals) |
1C | All or none study |
2A | Systematic review (with homogeneity) of cohort studies |
2B | Individual Cohort study (including low quality RCT, e.g.<80% follow-up) |
2C | “Outcomes” research; Ecological studies |
3A | Systematic review (with homogeneity) of case-control studies |
3B | Individual Case-control study |
4 | Case series (and poor quality cohort and case-control study |
5 | Expert opinion without explicit critical appraisal or based on physiology bench research or “first principles” |
Modern clinical decision-making is guided by evidence-based medicine standards formulated as Practice Guidelines by many professional orthopedic subspecialty societies. Typically, a strong recommendation in favor of a diagnostic test or treatment is assigned when there is level I evidence and consistent evidence from Level II, III, and IV studies in support of it.11 It is important to note that such a grading system for various clinical problems as the one proposed by the American Society of Plastic Surgeons (Table 3)19 and adopted by many societies such as the American Academy of Orthopedic Surgeons20,21 or the North American Spine Society 22–29 does not degrade lower-level evidence when deciding recommendations if the results are consistent. In experienced-based surgical subspecialties, Level II, III, and IV studies, in fact, make up the majority of clinical studies forming the basis for the diagnosis-specific evidence-based clinical practice guidelines.
Grade Practice Recommendations (Adapted from Burns et al. and the American Society of Plastic Surgeons publications) 11–14
Grade | Descriptor | Qualifying Evidence | Implications for Practice |
---|---|---|---|
A | Strong recommendation | Level I evidence or consistent findings from multiple studies of levels II, III, or IV | Clinicians should follow a strong recommendation unless a clear and compelling rationale for an alternative approach is present |
B | Recommendation | Levels II, III, or IV evidence and findings are generally consistent | Generally, clinicians should follow a recommendation but should remain alert to new information and sensitive to patient preferences |
C | Option | Levels II, III, or IV evidence, but findings are inconsistent | Clinicians should be flexible in their decision-making regarding appropriate practice, although they may set bounds on alternatives; patient preference should have a substantial influencing role |
D | Option | Level V evidence: little or no systematic empirical evidence | Clinicians should consider all options in their decision making and be alert to new published evidence that clarifies the balance of benefit versus harm; patient preference should have a substantial influencing role |
Many innovations in surgery in general and in Orthopedics in particular have been implemented without clinical trials. Examples include the first heart transplant performed by Dr. Christiaan Barnard in 1967.30 Since surgery is more skill- and experienced based than evidence based, many innovations over the last 200 years were implemented without any high-grade clinical evidence.31 In fact, many examples of orthopedic innovation exist, such as arthroscopic knee surgery for degenerative osteoarthrosis or meniscal tears, 32,33 vertebroplasty for osteoporotic vertebral fractures,34 and subacromial decompression for shoulder impingement,35 that did not withstand the scrutiny of sham-controlled randomized trial. 16 In fact, some of the major RCTs in orthopedic- and in spinal surgery failed to conclusively demonstrate the benefit of the intervention36 in spite of cumulative and consistent low-grade evidence to the contrary. 7,37–40 It is obvious to the beholder that many surgical RCTs are far from the gold standard double-blinded prospective RCT and suffer from lower methodological quality.41 The Jadad score is generally used to identify poor quality RCTs by assessing their methodological quality by scoring studies by randomization, masking, and patient retention (Table 4).42 A maximum score of 5 can be assigned. Consolidated Standards for Reporting Trials (CONSORT) was started to standardize reporting criteria and improve the quality of clinical trials (Figure 4). 43 The CONSORT revised recommendations for improving the quality of reports of parallel-group randomized trials are illustrated in the flow diagram shown in Figure 4. 44–46 The obtainable level of evidence from a poor-quality RCT (Level 2B; Table 2) may not be better than a well-conducted prospective cohort study. As discussed in the following, the latter can ultimately be more informative, particularly when coupled with a durability or survival analysis of the desired treatment benefit. 47–50 In other words, breaking through this ceiling effect posed by the limitations of orthopedic study trials may prove nearly impossible considering the following problems with randomization in surgical patients.
Jadad Scale (Adopted from Empirical Evidence of Associations Between Trial Quality and Effect Size) 42
Dimension | Sub Score | ||
---|---|---|---|
Randomization | 1. Was the study described as randomized (this includes the use of words such as randomly, random, and randomization)?=1 point | Give 1 additional point if: For question 1, the method to generate the sequence of randomization was described and it was appropriate (table of random numbers, computer generated, etc.) Deduct 1 point if: For question 1, the method to generate the sequence of randomization was described and it was inappropriate (patients were allocated alternately, or according to date of birth, hospital number, etc.) | |
Blinding | 2. Was the study described as double blind?=1 point | Give 1 additional point: If for question 2 the method of double blinding was described and it was appropriate (identical placebo, active placebo, dummy, etc.) Deduct 1 point: If for question 2 the study was described as double blind but the method of blinding was inappropriate (e.g., comparison of tablet vs. injection with no double dummy) | |
Withdrawals and dropouts | 3. Was there a description of withdrawals and dropouts?=1 point |
TOTAL JADAD SCORE.
Orthopedic RCTs are uncommon because several problems are hard to overcome. Orthopedic techniques and implants evolve and are constantly improved, making assessment over time difficult. The timing of an orthopedic RCT is not trivial. Early evaluation of a new implant or technique may be limited to a few surgeons who master it. On the contrary, the clinical equipoise may be lost if clinical trials are started too late. The technique may have become outdated when it was surpassed by newer technology. Orthopedic RCTs are also difficult to assess since the assumption that skill level and training among surgeons are equal is rarely true. Hence, standardization of interventions may be impossible. In addition, orthopedic surgeons may lose custody of their patients during essential periods of the postoperative recovery to physical therapy and many other subspecialties involved in postoperative care that may impact clinical outcomes while being oblivious to the postoperative rehabilitation protocol prescribed by the study design.
Compared to drug trials, which typically start with Phase II RCTs to determine statistical power, endpoints, and adverse events before multicenter Phase III trials are begun, surgical RCTs rarely make it out of Phase II single-site study. There are several reasons for this observation. Carrying orthopedic RCTs out in consecutive stages similar to drug trials may prove logistically impossible. Orthopedic practices consecutively enroll patients, making choosing endpoints, sample size, and effect size very difficult, increasing the probability of higher numbers of false positive and false negative findings. Patients are also reluctant to sign up for risky and certainly irreversible surgeries if the orthopedic surgeon has no clear recommendation on the best treatment. Trials may also be underpowered with high false-negative findings because of significantly lower recruitment than drug trials, possibly restricting funding access. Patients may also resist randomization to a nonsurgical arm when they have already failed conservative care and were referred for a surgical consultation. Pre-inclusion and expectation bias may reduce the likelihood of demonstrating benefit in additional nonsurgical care. Cross-over is an incredibly difficult to manage problem when comparing surgery with non-operative care. As mentioned earlier, the SPORT study on surgical vs. non-operative treatment of many common degenerative lumbar spine problems had several flaws. In the study on lumbar disc herniation, 60% of patients were assigned to surgery had a discectomy at two years compared with 45% of patients initially randomized to non-operative treatment. 17 Thus, intention-to-treat analyses were much obscured, leaving the investigators with an observational rather than interventional analysis.
The most significant drawback to orthopedic RCTs is the inability to blind. While blinding reduces bias, as demonstrated by a recent systematic review of 250 RCTs, 51 which found considerable differences in treatment effects between double-blinded trials compared to open-label trials. It is nearly impossible in surgical trials since sham-controlled interventions are rarely feasible. At a minimum, the surgeon always knows what operation was performed. Placebo and nocebo effects between surgical and interventional treatments may vary, making the common use of patient self-reported outcome measures (PROMS) such as the visual analog score (VAS), 52 the Oswestry Disability Index (ODI), 53,54 and the modified Macnab criteria 55 particularly problematic due to their subjective nature. The minimally clinical important difference of these PROMs is procedure specific. 56 Short follow-up or different follow-up protocols may potentiate the latter problem. 57
Reducing bias by propensity scoringUnderestimation or overestimation of the true effects of orthopedic interventions is common when bias in RCTs is not adjusted for. The risk of bias (RoB) that is avoidable in trials has been assessed for surgical versus non-surgical RCTs. 58 The risk of bias should be diminished. On the patients’ side, the risk of bias could be reduced by concealing the size of the incision by applying the same size dressing regardless of treatment, or by using similar postoperative rehabilitation protocols.59 Further, an expertise-based setup might be used with multiple orthopedic surgeons performing the same procedure for the same group. 58 If it is not possible to blind key individuals, computer randomization and the use of a blinded secondary team of surgeons or surgical nurses is helpful when attempting to lower RoB with unblinded assessment being the most important factor when assessing the susceptibility of an outcome to bias.
When randomization is impossible, propensity matching may reduce bias, particularly selection bias. 60 In observational studies, the treated and non-treated control groups could have considerable differences in covariates, leading to biased estimates of the treatment effects even when adjusted for by traditional covariance analysis. A propensity score is a conditional probability of being treated given the covariates. It can be used to reduce bias by balancing the covariates in the treatment groups. Estimating the propensity score requires modeling the distribution of the treatment indicator variable concerning the observed covariates. Such propensity scoring can reduce bias through matching, stratification, regression adjustment, or combinations of all three.
Potential solutionsGiven the listed limitations of randomized trials and the apparent ceiling effect of high-grade clinical evidence development in orthopedic and trauma surgery, several workarounds seem practical to arrive at clinically meaningful outcomes assessments that translate into improved orthopedic patient care protocols. For example, pseudorandomization may be attempted by illustrating differences in preferred treatments between centers. 61 Statistical power may be enhanced by concurrent prospective data collection in separate registries at different institutions and by orchestrating parallel cohort studies. In addition, the separation, as mentioned earlier, of the surgical team from other study team administrators, outcome evaluators, and data analysts may mitigate the lack of blinding between patient and surgeon. One potential downside to RCTs is their limited generalizability because strict inclusion and exclusion criteria distort the incidence of clinical problems in routine practice. Cohort studies are better at capturing the clinical effectiveness of intervention in everyday patients. Others have proposed registration similar to RCTs with disclosure of funding for observational studies 41 before data is retrieved. While this approach could limit and identify explorative data analyses, this increased bureaucracy of clinical research may also represent a significant hurdle for the individual surgeon clinical investigator who lacks funding and support. If implemented, such a rule would likely favor institutionalized researchers with access to such resources and marginalize the entrepreneurial clinical investigator in private practice or primary care settings. Centralized and institutionalized clinical studies in orthopedic- and trauma surgery may miss the mark on what is playing out in everyday patients because these tertiary care centers often have highly specialized niche practices. Instead of registration requirements, many journals rely on the STROBE checklist for observational studies, 62 whose requirements primarily deal with transparency of reporting (Table 5). Prospective authors are encouraged to give the requested information separately for cases and controls in case-control studies and, if applicable, for exposed and unexposed groups in cohort and cross-sectional studies with separate versions of the checklist for cohort, case-control, and cross-sectional studies available on the STROBE website. An explanation and elaboration article published by Elm et al. in the Lancet in 2007 discusses each checklist item and gives methodological background and published examples of transparent reporting. 62
The STROBE checklist of items that should be addressed in reports of observational studies (Reproduced from Elm et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies) 62
Item | Recommendation | Reported on page | |
---|---|---|---|
Title and abstract | |||
1 | (a) Indicate the study's design with a commonly used term in the title or the abstract | ||
(b) Provide in the abstract an informative and balanced summary of what was done and what was found | |||
Introduction | |||
Background/rationale | 2 | Explain the scientific background and rationale for the investigation being reported | |
Objectives | 3 | State specific objectives, including any prespecified hypotheses | |
Methods | |||
Study design | 4 | Present key elements of study design early in the paper | |
Setting | 5 | Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, follow-up, and data collection | |
Participants | 6 | (a) Cohort study—give the eligibility criteria, and the sources and methods of selection of participants. Describe methods of follow-up | |
Case-control study—give the eligibility criteria, and the sources and methods of case ascertainment and control selection. Give the rationale for the choice of cases and controls | |||
Cross-sectional study—give the eligibility criteria, and the sources and methods of selection of participants | |||
(b) Cohort study—for matched studies, give matching criteria and number of exposed and unexposed | |||
Case-control study—for matched studies, give matching criteria and the number of controls per case | |||
Variables | 7 | Clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable | |
Data sources/measurement | 8* | For each variable of interest give sources of data and details of methods of assessment (measurement). Describe comparability of assessment methods if there is more than one group | |
Bias | 9 | Describe any efforts to address potential sources of bias | |
Study size | 10 | Explain how the study size was arrived at | |
Quantitative variables | 11 | Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen, and why | |
Statistical methods | 12 | (a) Describe all statistical methods, including those used to control for confounding | |
(b) Describe any methods used to examine subgroups and interactions | |||
(c) Explain how missing data were addressed | |||
(d) Cohort study—if applicable, explain how loss to follow-up was addressed | |||
Case-control study—if applicable, explain how matching of cases and controls was addressed | |||
Cross-sectional study—if applicable, describe analytical methods taking account of sampling strategy | |||
(e) Describe any sensitivity analyses | |||
Results | |||
Participants | 13* | (a) Report the numbers of individuals at each stage of the study—e.g., numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analyzed | |
(b) Give reasons for non-participation at each stage | |||
(c) Consider use of a flow diagram | |||
Descriptive data | 14* | (a) Give characteristics of study participants (e.g., demographic, clinical, social) and information on exposures and potential confounders | |
(b) Indicate the number of participants with missing data for each variable of interest | |||
(c) Cohort study—summarize follow-up time (e.g., average and total amount) | |||
Outcome data | 15* | Cohort study—report numbers of outcome events or summary measures over time | |
Case-control study—report numbers in each exposure category, or summary measures of exposure | |||
Cross-sectional study—report numbers of outcome events or summary measures | |||
Main results | 16 | (a) Give unadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (e.g., 95% confidence interval). Make clear which confounders were adjusted for and why they were included | |
(b) Report category boundaries when continuous variables were categorized | |||
(c) If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period | |||
Other analyses | 17 | Report other analyses done—e.g., analyses of subgroups and interactions, and sensitivity analyses | |
Discussion | |||
Key results | 18 | Summarize key results with reference to study objectives | |
Limitations | 19 | Discuss limitations of the study, taking into account sources of potential bias or imprecision. Discuss both direction and magnitude of any potential bias | |
Interpretation | 20 | Give a cautious overall interpretation of results considering objectives, limitations, multiplicity of analyses, results from similar studies, and other relevant evidence | |
Generalizability | 21 | Discuss the generalizability (external validity) of the study results | |
Other information | |||
Funding | 22 | Give the source of funding and the role of the funders for the present study and, if applicable, for the original study on which the present article is based |
Another way to demonstrate differences in treatment effects between orthopedic treatment study groups is to illustrate the durability of the treatment effect over time. Superior treatments may last much longer than inferior treatments as defined by the need for additional treatment for the same condition – the serial time in Kaplan-Meier (K-M) curve construction of survival time probabilities listed in survival tables where survival times are sorted in an ascending manner beginning with the shortest serial times for each treatment group. 63 Patients are censored from the study if they drop out, are lost in follow-up, or in whom required data is not available. The cumulative probability of surviving (continued treatment benefit without additional treatment needed for the same condition) excluding censored events is plotted on the Y-axis of the K-M plot, allowing for the analysis of patient treatment intervals of varying duration. The difference between survival curves for the various treatment groups can be quantified for statistical significance using the log-rank test calculating the Chi-square (χ2) for each event time in the treatment arms. The summed results for each group are typically added to derive the ultimate Chi-square to compare the full K-M curves between groups. The confidence intervals (95%) for the Likelihood ratios can be calculated using the “log method” according to Altman et al., 64 While K-M curves do not represent the prognosis of the outcome – in fact, they deteriorate in accuracy with every patient being censored from the analysis – they provide an easy-to-understand graphic depiction of postoperative service utilization on an everyday patient level between different orthopedic treatments and help manage patients’ expectations regarding reoperations and functional outcomes in general. They are also helpful in illustrating to decision-makers on all sides of the healthcare equation the clinical benefit of various treatments for common degenerative and traumatic orthopedic conditions, all aimed at reducing the disease burden in the Latin American populations with cost-effectiveness and good stewardship of the limited healthcare resources in mind.
Example spinal endoscopyThe claim that targeted endoscopic treatment of common spinal pain generators produces higher patient satisfaction than traditional spine surgery used to lack support from statistically validated prognosticators of favorable outcomes with the endoscopic procedure for most surgical indications. In several previously published articles, the difficulties of orchestrating meaningful clinical outcome research studies were discussed. 65–70 Some of them were overcome with straightforward survival analysis of the benefit of endoscopic spinal surgery. Endoscopic spine surgery may be questioned for its medical necessity and effectiveness compared to other forms of spinal surgery. The lack of clinical evidence to warrant endoscopic decompression surgery of the spine is frequently called out. Escalating costs create pushback by payers and patients. Many outcome studies in endoscopic spine surgery are level III retrospective case series and cohort studies published by pioneers of the procedure, including several senior authors of this article. 65–70 Level I and II prospective randomized trials are few and far between. 71–76 In 2013, Birkenmeier et al. performed a metanalysis of comparative controlled clinical trials on endoscopic and microsurgical standard procedures. 77 Two lumbar randomized controlled trials (RCTs) 78,79 and one controlled study (CS) 80 were finally identified as eligible for evaluation. Clinical outcomes were similar between the endoscopic and the microsurgical methods in these trials. The endoscopic techniques were associated with fewer complications and a lower rate of revision surgeries requiring arthrodesis at a statistically significant level. In 2018, Kong et al. prospectively randomized patients (n=40) with lumbar disc herniation and lateral recess stenosis to endoscopic lumbar discectomy or microsurgical laminotomy showing similar outcomes. 81 Another open-label randomized trial compared percutaneous transforaminal endoscopic discectomy to mid-line microendoscopic discectomy. 82 Outcome analysis of ODI, VAS for back and leg pain, Medical Outcomes Study 36-Item Short-Form Health Survey bodily pain and physical function (SF-36), and EuroQol Group's EQ-5D were similar at one-year follow-up.
Does the question arise whether these traditional outcome tools, including the visual analog (VAS) leg and back pain score, 52 the Oswestry disability index (ODI), 53,54 Roland Morris score, 83 the short-form (SF) SF-12 and SF-36 84,85 are sensitive enough to detect the factors impacting patient satisfaction and clinical outcomes with endoscopic surgery? These traditional outcome tools do not adequately reflect the more favorable patient perception with the endoscopic surgery when compared to the types of translaminar surgeries it is trying to replace. They also do not measure the benefits of staging treatment options by targeting the predominant pain generator as opposed to treating all sources of pain currently suggested by imaged-based criteria.
Ultimately, the question of whether the endoscopic surgical treatment is more beneficial than other types of minimally invasive or open traditional spine surgery is complicated but can, at least in part can, be answered by examining the durability of the treatment benefit, defined as continued self-reporting of favorable PROM outcomes by patients without the utilization of any other service to manage the underlying degenerative disease process. A recent article published by some of the authors in the Journal of Personalized Medicine illustrates the clinical outcomes of the protocol of treating predominant lumbar pain generators versus image-based criteria for traditional spine surgery employing K-M survival curves (Figure 5). 67 While this type of analysis disregards many covariates, it provides a graphic representation of postoperative recovery and utilization dynamic, allowing innovative surgeons to perform superiority studies which in the fast-moving field of orthopedic- and trauma surgery may be the best way to stay ahead of technology advances before the clinical equipoise is lost.
Kaplan-Meier (K-M) Survival time in patients treated endoscopic, laminectomy decompression, and transforaminal lumbar interbody fusion treated for lumbar lateral canal and foraminal stenosis. The graphic depiction of the survival times illustrates to durability of the surgical treatment between patients with excellent, good, fair, and poor Macnab outcomes. The difference in clinical benefit between the three treatments in comparison to a control group have recently been published in a prospective cohort study of 412 patients. Lewandrowski et al., J Personalized Med, 2022; 12 (7).
Outcome research in orthopaedic- and trauma surgery is hampered by randomization problems similar to many other surgical subspecialties. Cross-over is one major problem that limits well-intended RCTs to single-center trials similar to Phase II drug trials. This fast-moving field may quickly lose clinical equipoise. A glass ceiling effect does exist in surgical clinical trials where well-designed prospective case-controlled and cohort series may produce better clinical evidence than poorly executed RCTs. These seemingly lower grade clinical evidence studies may be of higher value to the every-day patient seeking the highest quality orthopedic care. This conclusion is relevant to orthopedic surgeons in Latin America in particular since limited time and resources to support meaningful outcome research are commonplace in the cash-strapped public health care systems. Durability analysis of competing treatments and technology advances is an easy-to-understand visual way to communicate treatment benefits to patients, surgeons, and payors.
DisclaimerThe views expressed in this article represent those of the authors and no other entity or organization. The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Institutional Review Board StatementIn accordance with the Declaration of Helsinki, this editorial did not require approval by a Institutional Review Board.
Informed Consent StatementInformed consent was obtained from all subjects involved in the study.
Data Availability StatementThe data presented in this study are available on request from the corresponding author.
FundingNo funding was received to sponsor this work.
Conflict of InterestNone. This manuscript is not meant for or intended to endorse any products or push any other agenda other than the clinical utilization data associated with the presented research. There were no funders involved in the design or conduction of this study, collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. The authors declare no conflict of interest, and there was no personal circumstance or interest that may be perceived as inappropriately influencing the representation or interpretation of reported research results. This research was not compiled to enrich anyone. It was merely intended to highlight the common problems encountered by clinical investigators in Orthopedic Surgery and Trauma.
None.