BACKGROUND: This study aimed to present a simple scoring system incorporating ultrasound (US) examination, clinical, and laboratory data for improving diagnostic accuracy of acute appendicitis (AA), and to evaluate the performance of this scoring system in comparison to other scoring systems. A new score, together with 11 previous ones, was applied to a prospective independent population of subjects with suspected AA, and the respective performances were compared in terms of accuracy.
METHODS: 134 (70 males and 64 females) patients with suspected acute appendicitis were included in the study (mean age 28.7 ± 11.9 years). Demographic, clinical, and laboratory characteristics of the patients with suspected appendicitis were assessed using SPSS and four independent, statistically significant (p<0.01) predictors of the presence of AA were expressed as an integer-based scoring system.
RESULTS: Among 134 subjects, 72 went on to surgery and 58 had AA at operation. Four independent correlates of AA were identified and used for the derivation of the following integer-based scoring system: number of points = 6 for US demonstrating AA + 4 for tenderness in the right lower quadrant + 3 for rebound tenderness + 2 for leukocyte count >12,000/uL. In the study, the cut-off of ≥ 8 points for AA was used and sensitivity, specificity and accuracy of the proposed score were 95.4%, 97.4% and 96.5%, respectively.
CONCLUSION: The proposed scoring system introduces a quantitative combination of the clinical, laboratory, and imaging data, which may enhance the diagnostic accuracy of AA especially in those geographical regions where ultrasound scanning is readily available.
Among patients presenting to an emergency department with acute lower abdominal pain, acute appendicitis (AA) is often suspected. AA is a common surgical cause of acute abdomen, the prompt diagnosis of which is rewarded by a marked decrease in morbidity and mortality [1]. Quite frequently, the decision to perform surgery is based solely on clinical evaluation supplemented by laboratory data. Therefore diagnostic errors are common, resulting in a median incidence of perforation of 20% and a negative laparotomy rate ranging from 2% to 30% [1]. In order to improve the diagnostic accuracy of AA, ultrasound and computed tomography have been used as clinical aids resulting in reduced unnecessary laparotomy rates [1-5]. While ultrasound in expert hands can achieve a high degree of accuracy [1], its dependence on the operator may result in significant inter-observer variability in the diagnosis of AA. During the past few years, there has been a growing trend toward the use of formal probabilistic reasoning or quantitative data as a guide to clinical decision-making. In this respect, several scoring systems, computer-based models, and algorithms [2-12] have been developed for supporting the diagnosis of AA.
These decision-making tools are based on features of the medical history, certain clinical symptoms and signs, and laboratory markers of acute inflammatory response. In clinical studies, these decision tools are shown to be cost-effective and provide considerable diagnostic aids to physicians [13]. Nevertheless, the aforementioned decision tools have not been routinely applied in general practice and they have failed to achieve adequate accuracy in validation studies [14-17]. Accumulating evidence suggests that US in experienced hands improves diagnostic accuracy in suspected AA cases [18, 19]. Some have suggested that US imaging should be performed in all patients suspected of AA, because it is superior in identifying normal appendices and may provide alternative diagnoses [20]. However, US cannot replace clinical evaluation as false-negative rates of up to 24% have been reported [21]. While combination of clinical and laboratory data with findings on US may improve diagnostic accuracy of AA, only scant data exist on the use of such a combination as an integrated decision tool [22]. The aims of the present study were to develop a simple and reliable scoring system that would incorporate US assessment and essential elements of clinical evaluation and laboratory investigation to provide high diagnostic accuracy in patients with suspected AA and to evaluate the performance of the derived scoring system in comparison with previously proposed scoring systems in an independent dataset of subjects with suspected AA.
The present investigation included overall 134 subjects (70 males and 64 females) with suspected AA who were selected during a span of 2 years (conducted between January 2005 and December 2006). The study was observational and no intervention was done except for the addition of formalized data collection. Subsequently, the performance of the score in the above database was compared to that of 11 previously proposed diagnostic scores for AA, which was also calculated by using data from the study population. The selection criteria regarding the aforementioned diagnostic scores for AA were:
(1) Development of each score from patients presenting with acute abdominal pain,
(2) Previous validation in at least one prospective study and
(3) Feasibility of each score calculation (namely no missing variables) on the basis of the data prospectively collected in our study by using a structured form that included a standardized questionnaire.
Demographic, clinical, and laboratory characteristics of the patients with suspected appendicitis were assessed using SPSS (version 11.0) and four independent ,statistically significant (p<0.01) predictors of the presence of AA were expressed as an integer-based scoring system, which were assigned a weight (point) to each predictor and summed the weights of the predictors that were present for a subject: [number of points = 6 for US positive for AA, + 4 for tenderness in right lower quadrant, + 3 for rebound tenderness, + 2 for leukocyte count > 12,000/uL identified in the analysis. Non-operated subjects were assumed not to have AA, because none of them developed appendicitis during follow-up of 6 weeks. The second goal of the present study was to compare the new scoring system with the previous ones in terms of accuracy. During the study the decision to operate or not was left to the judgment of the senior surgeon, who was not aware of the conclusion of each model for every individual subject. All the ultrasound examinations in this study were performed by the senior postgraduate resident. In transducer and the graded compression technique. Each patient the abdomen was initially examined with ultrasound by using 2.5-5 MHz convex array transducer. This evaluation was supplemented with ultrasound assessment of the appendix and the surrounding region by using a 5 MHz linear array The presence of periappendiceal fluid, thickened appendix and/or fecolith was considered as signs of AA.
STATISTICAL ANALYSIS
Statistical analysis was performed using the Statistical Package for the Social Sciences software (SPSS Inc, release 11.0). Acute appendicitis at operation, confirmed by histopathology was used as the end point in the study. Univariate associations between the presence of the aforementioned end point and clinical or laboratory features were evaluated with the chi-square test, as appropriate for categorical data, and with Student’s t-test for continuous variables. Ninety-five percent confidence intervals (95% CIs) were calculated for each comparison. 2 X 2 tables were used to calculate the sensitivity, specificity, negative predictive value, positive predictive value and accuracy. All tests of significance were two-tailed, and a p value less than 0.05 was considered statistically significant.
The above diagnostic score was calculated for 134 randomly selected patients (70 [52.2%] males and 64 females [47.8%]; mean age 28.7 ± 11.9 years [range; 15–79 years]) hospitalized for suspected AA. Among the 134 subjects, 73 (54.0%) had surgery of which 58 (43.3%) had AA at operation. The application of the new classification tool showed that 96.5% of subjects with 8–15 points had AA (Table 1). The proposed diagnostic scoring system yielded a score of < 8 points for all 61 non-operated patients in the study. The diagnostic accuracy of the present model is found to be better than the previous ones (Figure 1). The normal appendectomy rate was 19.4% (14 out of 72 operated patients). None of the 6 patients (4.5% of total) who were in the subgroup with the lowest score (0–4 points) had AA, whereas in 56 (96.5%) of the patients with the highest score (8–15 points; n = 58 [41.8% of total]), AA was the final diagnosis. Nevertheless, the proportion of subjects with AA among patients with moderate scores (5–7 points; n = 70 [52.2% of total]) was very small (3 out of 70, 4.3%). Thus, using the cut-off of ≥ 8 points for the diagnosis of AA in this study, a very high probability of AA would have been assigned to subjects with 8–15 points (96.5%, 56/58) as opposed to the very low probability for patients with 0–7 points (4.3%, 3/70).
The model suggested in the present study combines the diagnostic value of four variables: namely two well-recognized clinical features of AA (tenderness in the
right lower quadrant and rebound tenderness) [1], ultrasound imaging, and leukocytosis, the latter reflecting the inflammatory response. The prominence of the aforementioned factors as independent correlates of AA corroborates previous reports, which have shown scores not including the above clinical variables and leukocytosis to provide poorer discrimination [1, 15]. With regard to the varied weighting of the four multivariate predictors, a positive US finding surpassed any other factor by introducing an at least 5.5-fold increase to the probability of AA as suggested by 95% CIs (Table 3). According to the proposed threshold of ≥ 8 points, if the appendix is sonographically shown to be inflamed, the presence of at least one additional factor is required to establish AA, whereas in the absence of US demonstrating AA, all three remaining variables are necessary for the diagnosis. For example, the above model would suggest the diagnosis of AA in a patient with leukocytosis and a positive US finding (total score 8 points), even if rebound or right lower quadrant tenderness were lacking. The application of the new system to the external database yielded an impressive diagnostic accuracy of 96.5%, which exceeded noticeably the performance of previous scores. The superiority of the new score could be attributed to the incorporation of an imaging modality in a formal decision tool for AA, which is the novel diagnostic procedure introduced in the present study. Although sonographic imaging of the abdomen has been established as a useful tool in diagnosis of AA being of particular value in patients with atypical presentation [23], its accuracy has been doubted in more recent large studies and meta-analyses [18, 19, 21, 24-26]. In this respect, it has been demonstrated that, when US is used as the determining factor for operative therapy, it cannot be relied on to the exclusion of the surgeon’s careful and repeated evaluation [21]. Furthermore, a prospective multicenter observational trial on 2280 patients with acute abdominal pain reported no correlation between the sonographic findings of the appendix and the diagnostic accuracy of the clinician, the rate of negative appendectomy, and the perforation rates, thus suggesting no clear benefit of ultrasound scanning of the appendix in the routine clinical setting [19]. In addition, sonography failed to improve the diagnostic accuracy or the negative appendectomy rate and was even found to delay surgical consultation and appendectomy in a large study that included 766 subjects [24]. Nevertheless, it has been shown that ultrasound is unnecessary when there is a high degree of clinical suspicion as expressed by a positive Alvarado score, whereas the additional information provided by ultrasound improves diagnostic accuracy in the case of a negative or equivocal Alvarado score [25]. Moreover, a meta-analysis published in the middle 1990s suggested that ultrasound is most helpful in patients with an indeterminate probability of the disease after the initial evaluation and should not be used to exclude AA in subjects with classic signs and symptoms because of the underlying relatively high false-negative rate18. Finally, a more recent meta-analysis on the value of ultrasound in the diagnosis of AA revealed disappointing results in multi-center trials, suggesting that the adequate performance of sonography in single-center studies may not reflect surgical everyday life26. Ultrasound is rapid, noninvasive, inexpensive, and requires no patient preparation or contrast material administration23. Because it involves no ionizing radiation and excels in the depiction of acute gynecologic conditions, it is recommended as the initial imaging study in children27 and in women28, especially during pregnancy29. Yet, the limitations of ultrasound include its reduced accuracy in obese or muscular subjects, as well as in patients with perforated AA (approximately 50%) compared to that observed in nonperforated AA (80%)23. Furthermore, ultrasound is known to be highly operator-dependent, the learning curve required to develop the technique for sonographically scanning the right lower quadrant is considerable, and there are many interpretive pitfalls to be avoided23. It has been shown, however, that even if radiology residents or inexperienced surgeons conduct the imaging, the accuracy of ultrasound is not diminished 30, 31. In any case, although the criteria for the ultrasound -based diagnosis of AA are well-established and reliable, the inexperienced examiner, suggesting that the proposed classification system may not apply to geographical areas where CT scanning is readily available on a 24-hour basis. In this study, the inability to routinely perform CT scanning may account to a great extent for the relatively high false positive rate of approximately 20%. This number of false positive diagnoses would be unacceptable in most Westernized nations, where the appropriate CT utilization in community hospitals has been shown to reduce the negative appendectomy rate from 14%–20% to 2%–7%36–38. Nevertheless, because many portions of the world health community may still not be able to afford CT scanning but can afford ultrasound favored the latter, the respective clinical implications should be further evaluated. A prospective interventional large-scale evaluation in different clinical environments, in an adequate controlled study comparing equipment, the combined systematic implementation of sonographic evaluation and clinical acumen could be valuable as suggested by the present study. Because the simultaneous application of the preexisting models and the new score to the same database has working with poor equipment and/or technique, will provide suboptimal results, and this possibility should be taken into account when incorporating sonographic criteria in the diagnostic pattern. The use of ultrasound in the setting of suspected AA might be questioned in an era when appendiceal computed tomography (CT) has been demonstrated to provide an accuracy rate as high as 98% in the diagnosis of AA, leading to improved patient care and reduced use of hospital resources32. Moreover, CT has repeatedly been shown to exhibit superior discriminatory capacity compared to ultrasound in both adults and adolescents with suspected AA33–35, comparing a baseline phase without scoring to a subsequent phase with scoring would probably be the optimal approach15, 16. To reduce bias with such a design, uniform data collection should be carried out according to constant definitions, with standardized performance criteria used to ensure objective evaluation16. Any diagnostic support for AA should be warmly welcomed if it has been proven to be clinically valuable, because unacceptably high negative appendectomy and perforation rates are still reported in many portions of the world health community. However, apart from being familiar with elements not included in a quantitative model, physicians may be able to provide superior imputations of missing data for an individual patient and to integrate the diagnostic estimate as part of their overall patient assessment. Therefore, including the proposed score in the diagnostic procedure is worth trying and may enhance a surgeon’s discriminatory capacity; under the prerequisite that it will be considered as an adjunct in decision making that cannot supplant careful surgical judgment.