Introduction and aim. Transient elastography is gaining popularity as a non-invasive method for predicting liver fibrosis, but inter observer agreement and factors influencing reproducibility have not been adequately assessed.
Material and methods. This cross-sectional study was conducted at Specialized Medical Hospital and the Egyptian Liver Foundation, Mansoura, Egypt. The inclusion criteria were: age older than 18 years and chronic infection by hepatitis C. The exclusion criteria were the presence of ascites, pacemaker or pregnancy. Three hundred and fifty-six patients participated in the study. Therefore, 356 pairs of exams were done by two operators on the same day.
Results. The overall inter observer agreement ICC was 0.921. The correlation the two operators was excellent (Spearman’s value q = 0.808, p < 0.001). Inter-observer reliability values were κ = 0.557 (p < 0.001). A not negligible discordance of fibrosis staging between operators was observed (87 cases, 24.4%). Discordance of at least one stage and for two or more stages of fibrosis occurred in 60 (16.9%) and 27 cases (7.6%) respectively. Obesity (BMI ≥ 30 kg/m2) is the main factor associated with discordance (p = 0.002).Conclusion. Although liver stiffness measurement has had an excellent correlation between the two operators, TE presented an inter-observer variability that may not be negligible.
The accurate diagnosis of chronic hepatitis C (CHC) related fibrosis is crucial for prognosis and treatment decisions, and is currently best evaluated by histological examination of liver biopsy.1,2 However, liver biopsy has several disadvantages, including poor patient compliance, sampling errors, limited usefulness for dynamic follow-up, and a risk of complications typical of invasive procedures. In addition, the predictive power of histology may be weakened by sampling variability.3–7 Together, these constraints of liver biopsy have encouraged the search for non-invasive methods to assess progression of fibrosis. The ideal noninvasive technique should be valid, painless, reproducible, easy-to-learn, easy-to-perform and cheap. There is an interest in developing methods, either serological or imaging, which is all non-invasive, in order to determine the presence and degree of fibrosis. Among these new techniques are serum markers and unidimensional transient elastography (FibroScan).
Transient elastography (TE) using FibroScan is a noninvasive, and reproducible technique that evaluates tissue stiffness. Liver stiffness measurement (LSM) has been demonstrated to be a reliable tool for assessing hepatic fibrosis and cirrhosis, especially in patients with chronic hepatitis.8–12
Obesity is one of the most significant public health problems faced by people from industrialized countries and is rapidly catching up in developing countries also. In 2016, it is estimated that more than 1.9 billion adults were overweight with 650 million obese.13
In clinical practice, the applicability of FibroScan is particularly influenced by the expertise of the operator14 and the body mass index (BMI) of the patient.15 According to The World Federation for Ultrasound in Medicine and Biology, 201516 the main limitation to the use of TE in clinical practice is its limited applicability in patients with obesity. The use of the XL probe reduces the failure rate in them but results in a high rate of unreliable results (approximately 25%). Subcutaneous adipose tissue attenuates the transmission of the shear wave into the liver and the ultrasonic signal to measure the propagation speed. Therefore, numerous studies have shown that obesity is the strongest predictor of failure or unreliable LSM.17
In Egypt, where the HCV prevalence is the highest in the world,18 the National recommendations required that HCV-infected patients undergo a liver biopsy at the initial evaluation and that a treatment is offered to patients with a fibrosis rate ≥ F2. Egypt moved to mass treatment of HCV, but still, liver fibrosis determines the type and duration of treatment. Having a cheaper and more acceptable alternative to liver biopsy is critical in a country where six million individuals are chronically infected with HCV.19 However, these new noninvasive methods should be locally evaluated, as circulating virus genotype (genotype 4), increased body mass index highly prevalent in Egypt, and co-infections with schistosomiasis may interfere with liver fibrosis assessment. The national plan of action for prevention, care and treatment of viral hepatitis in Egypt (2014-2018) suggested the use of noninvasive methods (FibroScan and FIB4) for evaluation of viral hepatitis patients.20 Uncertainties still exist regarding reliability and reproducibility of TE.21,22
The present study aimed to evaluate the inter observer variability in TE in patients with CHC, obtained on the same day, by two experimented operators and the factors affecting this variability.
Material and MethodsStudy designThis cross-sectional study was conducted at Specialized Medical Hospital and the Egyptian Liver Foundation, Mansoura, Egypt. The inclusion criteria were; age older than 18 years and chronic infection by hepatitis C that characterized by the presence of HCV-RNA in blood serum. The exclusion criteria were the presence of ascites, pacemaker or pregnancy. Three hundred and fifty-six consecutive patients with CHC participated in the study. Therefore, 356 pairs of exams were done by two operators at the same day and before the start of any anti-HCV treatment. Both operators have realized more than 500 exams previously, being classified as experienced operators.15
Ethical considerationsThis study protocol was conducted in accordance with the Helsinki Declaration and was approved by Mansoura Faculty of Medicine Ethics Committee. All patients signed an informed consent upon enrolment in this study.
Transient elastography (TE)The procedures were performed by two independent investigators on the same day. The right lobe of the liver was accessed through an intercostal space while the patient was lying down in the dorsal decubitus position with the right arm in maximum abduction position. Using the FibroScan (Echosens, Paris, France) guide, a portion of liver of at least 60 mm in thickness, free of large vessels, was identified for examination. The rate of successful measurement was calculated as the ratio between the numbers of validated to total measurements. The results were expressed as a median value of the total measurements in kilo Pascal (kPa). TE was considered reliable when the following criteria had been met:
- •
Ten successful measurements.
- •
An interquartile range (IQR) lower than 30% of the median value.
- •
A success rate of more than 60%.23
Liver stiffness was considered as the median of all valid measurements.
Liver fibrosis stagingLiver fibrosis, estimated by TE, was converted to the METAVIR scoring system24 as proposed by Ziol, et al.:8 < 8.8 as F0-F1; 8.8-9.6 as F2; > 9.6-14.6 as F3 and > 14.6 kPa as F4.
Liver biopsyLiver biopsy specimens were obtained under complete aseptic procedures to retrieve 15 mm core or at least 15 portal tracts. The specimen was processed and stained with hematoxline and eosine. Fibrosis was staged on a 0.4 scale:
- •
F0: no fibrosis.
- •
F1: portal fibrosis without septa.
- •
F2: portal fibrosis and few septa.
- •
F3: numerous septa without cirrhosis.
- •
F4: cirrhosis according to METAVIR scoring system.24
Height and weight were measured by trained personnel. Height was rounded to the nearest 0.1 cm. Weight was measured using a digital scale, recorded to the nearest 0.1 kg. All measurements were taken at least twice; a third measurement was taken if the two measurements differed by a pre-determined amount (> 0.5 cm for height, > 0.2 kg for weight). The average of the two closest measurements was used. BMI was computed as the ratio of measured or reported weight [kg] to height [m] squared.
Statistical analysisStatistical analyses were performed using version 24, SPSS (Statistical Package for Social Sciences) (IBM Corp., USA). Continuous variables were reported as median (IQR). Categorical variables were reported as frequency (%). Non-parametric tests, Mann-Whitney test for quantitative and Fisher’s exact test for qualitative comparisons, were used. Applicable examinations obtained by both operators were compared and assessed by Wilcoxon signed-rank for paired continuous variables, Mann-Whitney U test, and Spearman’s rank correlation. Inter-observer agreement was assessed the intra class correlation coefficient (ICC) and Kappa (κ) index. In addition, the Bland-Altman graphic was plotted.25 Significance level was determined when P ≤ 0.05 assuming two tailed tests.
ResultsTE was performed by both operators in 356 patients [70.8% male gender, median (IQR) age 39 (31–47) years, BMI 27.6 (24.4–32.5) kg/m2]. Table 1 shows the baseline characteristics of the studied patients.
Baseline characteristics of studied patients.
Characteristic | Value |
---|---|
All patients | 356 |
Male Gender | 252 (70.8%) |
Age (years) | 39.00 (31.00-47.00) |
BMI (kg/m2) | 27.57 (24.44–32.51) |
BMI ≥ 30 kg/m2 | 111 (31.2%) |
ALT (U/L) | 47.50 (32.00–69.75) |
AST (U/L) | 38.00 (28.93–60.00) |
S. albumin (g/dL) | 4.50 (4.20-4.80) |
S. bilirubin(mg/dL) | 0.70 (0.60–0.80) |
Glucose (mg/dL) | 91.00 (84.75-102.00) |
Hgb (g/dL) | 14.00 (12.90-15.10) |
WBCs(/cmm3) | 6.3 (5.2-7.5) |
Platelets (/cmm3) | 207 (167-242) |
HCV PCR (IU/L) | 643500 (154500-1468312) |
Data are presented as number (%), or median (IQR). BMI: body mass index. ALT: alanine transaminase. AST: aspartate transaminase. Hgb: hemoglobin. WBCs: white blood cells. HCV PCR: hepatitis C virus polymerase chain reaction.
The use of receiver operator characteristic (ROC) curves to get cut-off points for this data and the relation of these classifications to liver biopsy Metavir score were previously reported in a separate paper.26
The overall inter observer agreement ICC was 0.921 (95% CI 0.903-0.936). The correlation between first and second operator was excellent (Spearman’s value q = 0.808, p < 0.001). Figure 1 graphically displays the relationship between the results of both tests. However, we observed an important difference of the median values of LSM between examinations performed by the first and second operator in the paired patients (6.8 kPa for the first operator vs. 7.6 kPa for the second operator; p = 0.002).
Table 2 shows number of patient staged for liver fibrosis by FibroScan performed by both operators based on the score system proposed by Ziol, et al. 2005.8 Inter-observer reliability values were κ = 0.557 (p < 0.001). A not negligible discordance of fibrosis staging between operators was observed by this method (87 cases, 24.4%). Discordance of at least one stage and for two or more stages of fibrosis occurred in 60 cases (16.9%) and 27 cases (7.6%) respectively.
Patient (n) staged for liver fibrosis by FibroScan performed by both operators based on the score system proposed by Ziol, et al. 2005.8
First operator | Second operator | ||||
---|---|---|---|---|---|
F0-1 | F2 | F3 | F4 | Total | |
F0-1 | 198 | 18 | 18 | 0 | 234 |
F2 | 10 | 8 | 9 | 0 | 27 |
F3 | 9 | 8 | 27 | 10 | 54 |
F4 | 0 | 0 | 5 | 36 | 41 |
Total | 217 | 34 | 59 | 46 | 356 |
κ = 0.557, p < 0.001.
Table 3 summarizes the factors associated with discordance on fibrosis stage based on FibroScan. It appears that obesity (BMI ≥ 30 kg/m2) is the main factor associated with discordance (p = 0.002).
Risk factors associated with concordance/discordance on fibrosis staging based TE between the two operators.
Characteristic | Concordance of fibrosis stage | Discordance of at least one fibrosis stage | P value |
---|---|---|---|
Number | 269 (75.6%) | 87 (24.4%) | |
Male Gender | 186 (69.1%) | 66 (75.9%) | 0.278 |
Age, years | 38.00 (29.00-46.00) | 42.00 (33.00-49.00) | 0.048 |
BMI, kg/m2 | 27.46 (24.43-30.40) | 28.80 (24.58-31.13) | 0.170 |
BMI ≥ 30 kg/m2 | 72 (26.8%) | 39 (44.8%) | 0.002 |
ALT, U/L | 47.00 (31.00-73.00) | 48.00 (34.00-66.00) | 0.973 |
AST, U/L | 38.00 (28.45-62.50) | 37.00 (29.00-51.00) | 0.293 |
Albumin | 4.50 (4.20-4.80) | 4.50 (4.30-4.70) | 0.686 |
Bilirubin | 0.70 (0.60-0.87) | 0.80 (0.70-0.90) | 0.225 |
Glucose | 91.00 (85.00-100.00) | 89.50 (84.00-106.25) | 0.788 |
Hgb | 14.00 (12.90-15.10) | 14.00 (13.10-15.10) | 0.610 |
WBCs | 6300 (5105-7500) | 6500 (5410-7600) | 0.245 |
Platelets | 210.00 (169.00-244.50) | 196.00 (165.00-233.00) | 0.465 |
HCV PCR | 590000 (156000-1230000) | 690192 (140000-2860000) | 0.143 |
Data are presented as number (%), or median (IQR). BMI: body mass index. ALT: alanine transaminase. AST: aspartate transaminase. Hgb: hemoglobin. WBCs: white blood cells. HCV PCR: hepatitis C virus polymerase chain reaction.
Bland-Altman plot showed an acceptable agreement, presenting only thirteen patients scoring outside the tramlines (95 percent limits of agreement of ± 2 standard deviations) (Figure 2). However, this graphic highlighted a trend of difference across the mean rates with a greater disagreement in higher liver stiffness.
DiscussionIn the last years, papers were published referring to the variability in TE, most of them presenting conflicting results.11,27–30 Although this study has shown an excellent correlation between the two operators, we should highlight a considerable inter observer variability in TE. The excellent correlation between operators [ICC 0.921 (95% CI 0.903-0.936); p < 0.001 and Sperman’s q = 0.808; p < 0.001] described in our study confirmed the results of previous publications.11,27,29 Furthermore, the κ value for diagnosis of fibrosis stages showed in our results, was very similar to those described by others authors.28,30 Fraquelli, et al.,11 the first authors that evaluated the reproducibility of TE, reported an excellent inter observer ICC of 0.98 (0.977–0.987) in 200 patients with chronic liver disease, most of them with CHC infection. Boursier, et al.27 showed 25% of discordance of at least one stage of fibrosis. This relative high discrepancy of fibrosis stage was also reported by Roca, et al.29 in a more recent publication (23%). We found similar rates of discordance (24.4%) with those described by these previous authors27,29 and smaller rates than those found by Perazzo, et al.30 who found rate of discordance of 35%. TE might be influenced by the etiology of liver disease31 and the prevalence of fibrosis, known as spectrum effect.32 These two major factors may explain the difference between these studies. Our study included exclusively mono-infected CHC patients. Similarly, study of Perazzo, et al.30 included only CHC mono-infected patients. Conversely, Fraquelli, et al.11 and Boursier, et al27 included patients with chronic liver disease of mixed etiologies and Roca, et al.29 enrolled HIV patients, most of them (61%) with hepatitis C co infection and mild liver fibrosis.
These high rates of discrepancy of at least one stage of fibrosis are not negligible and might influence the management of patients with CHC. Distinction between mild and fibrosis F ≥ 2 indicates antiviral treatment in many countries or may have impact on the duration of treatment. In this present study 55 (15.4%) patients were classified as mild fibrosis by one operator and F ≥ 2 by the other. Furthermore, the diagnosis of cirrhosis defines the start of hepatocellular carcinoma screening in many countries. Our study reported discordance in 15 (4.2%) patients on the cirrhosis diagnosis between operators impacting in the hepatocellular screening. Thus, about 20% of our patients would have had impact on their management by this discordance when using exclusively TE.
In a study that evaluated more than 13,000 TE examinations, unreliable results were independently associated with body mass index, lower operator experience, older age, female gender, and metabolic factors.15 However, in our study, only overweight (BMI ≥ 30 kg/m2) was significantly associated with concordance/discordance on fibrosis staging based on TE between the two operators. TE should be used cautiously as an indicator of liver biopsy for assessing liver fibrosis in patients with fat problems. An association of inter observer discrepancy with higher body mass index has been found in other studies like that of Fraquelli, et al.,11 Boursier, et al.,21 and Wong, et al.34 The interaction of fat with low-frequency vibrations of TE may affect the signal to noise ratio, which is the relevant parameter for assessing liver stiffness.11
Our study major strength was the fact that our sample was composed exclusively of patients with CHC infection. Previous studies that have evaluated TE inter observer variability analyzed patients with chronic liver disease of mixed etiologies, with the exception of Perazzo, et al.30 LSM accuracy and proposed cutoffs were different according to the liver disorder.33 Probably, also there is a variation in the inter observer agreement of this method in different hepatic diseases.
The inter observer variability might be caused by interpretation of artificial TE cutoffs. In our study, LSM were well correlated between operators. Among 51 patients diagnosed as cirrhosis (F4) by at least one operator, none were misclassified as mild fibrosis (F0F1). The major discrepancy was observed in F0 to F2 fibrosis stages, where published cut-offs are extremely variable.
Non-invasive biomarkers can also accurately stage liver fibrosis in CHC.35 The association of biomarkers with TE would minimize the limitations of TE and could be useful in clinical practice.
Thus, clinicians should be aware that the inter observer variability in TE might exist and could impact on management of patients with CHC. TE results should be carefully analyzed by expert hands, especially in patients with a single evaluation caused by the possibility of inter observer variability.
ConclusionAlthough LSM has had an excellent correlation between the two operators, TE presented an inter-observer variability that may not be negligible. TE is a user-friendly non-invasive technique, however, its performance may be altered by high BMI, which affects both the feasibility and reproducibility of the test. It is recommended that with high BMI, two TE examinations with the XL probe, with two experienced examiners, should be done. The presence of this inter observer variability should be taken in consideration during the use of TE in treatment centers, research, and clinical trials.
Abbreviations- •
ALT: alanine transaminase.
- •
AST: aspartate transaminase.
- •
BMI: body mass index.
- •
CHC: chronic hepatitis C.
- •
HCV: hepatitis C virus.
- •
Hgb: hemoglobin.
- •
ICC: intra class correlation.
- •
IQR: inter quartile range.
- •
kPa: kilo Pascal.
- •
LSM: liver stiffness measurement.
- •
PCR: polymerase chain reaction.
- •
ROC: receiver operator curve.
- •
SD: standard deviation.
- •
SPSS: statistical package for social sciences.
- •
TE: transient elastography.
- •
WBCs: white blood cells.
The authors who have taken part in this study declared that they do not have anything to disclose regarding conflict of interest with respect to this manuscript.
Financial SupportThis work was funded by Science Technology Developmental Fund (STDF) grant, Egyptian Ministry of higher education and scientific researh under call named; TC/2/Health/2010/hep-1.6:diagnostic imaging for accurate diagnosis and staging of the disease, project number 3512.